# Rounding issues in ODE-solver

Dear Stan-community,

I am modelling a simple SEIR model in stan.

\frac{dS}{dt} = - \beta S I \\ \frac{dE}{dt} = \beta S I - \tau E \\ \frac{dI}{dt} = \tau E - \gamma I \\ \frac{dR}{dt} = \gamma I

Due to limitations in the accuracy of the solver ( I am using ode_rk45_tol, but had similar issues with other solvers ) I sometimes get slightly negative values for the number of individuals in the I comparmtent. This leads to errors in my program, because I use these values as the rate in a poisson process. I though this could be solved by log transforming the model. Then, the number of infected individuals would be the exponential of the output of the model and always be positive. I was hoping this would make my model more efficient.

Log transformation of the SEIR model:

Step 1, divide by variables:
\frac{1}{S} \frac{dS}{dt} = - \beta S I \frac{1}{S} \\ \frac{1}{E} \frac{dE}{dt} = (\beta S I - \tau E) \frac{1}{E} \\ \frac{1}{I} \frac{dI}{dt} = (\tau E - \gamma I) \frac{1}{I} \\ \frac{1}{R} \frac{dR}{dt} = \gamma I \frac{1}{R} \\

Step 2, simplify:
\frac{d logS}{dt} = - \beta I \\ \frac{d logE}{dt} = \frac{\beta S I}{E} - \tau \\ \frac{d logI}{dt} = \frac{\tau E}{I} - \gamma \\ \frac{d logR}{dt} = \frac{\gamma I}{R}

When I analyse this model and the original SEIR model in r, I find that both are equivalent. I also included a model in the code that is scaled by the population size and models the whole population istead of the fraction of infected individuals. The initial conditions cannot be zero, because I cannot logtransform zero. Thus, I choose very small values, this does not really influence the output of the models.


library(deSolve)

# Set parameters and time sequence
p <- c(beta = 0.08, gamma = 2/7, tau = 5/7, contact = 77, popsize = 10000)
times      <- seq(0, 45, by = 0.01)

# Set initial coditions
s_norm     <- c(S = (1-1/10000), E=0, I = 1/10000, R = 0)
s_log     <- c(logS = log(1-1/10000), logE=-50, logI = log(1/10000), logR = -50)
s_pop     <- c(logS = log(10000-1), logE=log(0.00001), logI = log(1), logR = log(0.00001))

# if initial values for E and R are too high, the result is not the same
s_pop_stan <- c(logS = 9.21024,logE=-1,logI=0,logR=-1)

# Traditional way to code SEIR model
gfull <- function(time, y, params) {
g <- with(as.list(c(y,params)),
c( -beta*contact*S*I,
beta*contact*S*I - tau*E,
tau*E - gamma*I,
gamma *I )
)
return(list(g))
}

# Log transformed SEIR model
gfull_log <- function(time, y, params) {
g <- with(as.list(c(y,params)),
c(  - beta*contact*(exp(logI)),
beta*contact*(exp(logI)*(exp(logS)))/(exp(logE)) - tau,
tau*(exp(logE))/(exp(logI)) - gamma,
gamma*(exp(logI))/(exp(logR)) )
)
return(list(g))
}

# Log transformed and scaled with population size
gpop <- function(time, y, params) {
g <- with(as.list(c(y,params)),
c( - beta*contact*(exp(logI))/popsize,
(beta*contact*(exp(logI)*(exp(logS)))/(exp(logE)))/popsize - tau,
tau*(exp(logE))/(exp(logI)) - gamma,
gamma*(exp(logI))/(exp(logR)) )
)
return(list(g))
}

# Run all versions of the model
result_log <- ode(y = s_log, times = times, func = gfull_log, parms = p)
result_norm <- ode(y = s_norm, times = times, func = gfull, parms = p)
result_pop <- ode(y = s_pop, times = times, func = gpop, parms = p)

# compare results for I compartment
par(mfrow=c(1,1))
plot(times, exp(result_log[,3]), col="blue", xlab="time in weeks", ylab="Number of infected individuals", type="l", main="R deSolve model")
lines(times, result_norm[,3], col="red",type="l")
lines(times, exp(result_pop[,3])/10000,col="violet")
legend("topright", legend=c("Traditional model", "Logtransformed model", "Logtransformed population scaled"), col=c("red", "blue","violet"),lty=1)



However, when I run the exact same models in Stan, I get nan values for the ODE output if the initial values are smaller than -11 for the logtransformed model and smaller than log(0.01) for the version that is scaled by the population size.

functions {

// ODE for basic SEIR model
vector gfull(real t, vector y, data real tau, data real gamma, real beta,
data real contact, data int num_comp) {

// define compartments
vector[num_comp] dydt;

// define ODEs
dydt[1] = - beta * contact * y[1] * y[3];
dydt[2] =   beta * contact * y[1] * y[3] - tau * y[2];
dydt[3] =  tau * y[2] - gamma * y[3];
dydt[4] =  gamma * y[3];

return dydt;
}

// ODE for logtransformed SEIR
vector gfull_logtrans(real t, vector y, real I0, data real tau, data real gamma, real beta,
data real contact, data int num_comp) {

// define compartments
vector[num_comp] dydt;

// define ODEs
dydt[1] = - beta*contact*(exp(y[3]));
dydt[2] =  (beta*contact*exp(y[3])*exp(y[1]))/exp(y[2]) - tau;
dydt[3] =  tau*exp(y[2])/exp(y[3]) - gamma;
dydt[4] =  gamma*exp(y[3])/exp(y[4]);

return dydt;
}

// ODE for basic SEIR model logtransformed and scaled with populationsize
vector gfull_pop(real t, vector y, real I0, data real tau, data real gamma, real beta,
data real contact, data int num_comp, data int popsize) {

// define compartments
vector[num_comp] dydt;

dydt[1] = -beta*contact*(exp(y[3]))/popsize;
dydt[2] =  (beta*contact*exp(y[3])*exp(y[1]))/(exp(y[2])*popsize) - tau;
dydt[3] =  tau*(exp(y[2]))/(exp(y[3])) - gamma;
dydt[4] =  gamma*(exp(y[3]))/(exp(y[4]));

return dydt;
}

}

data {
// data objects that are generic independent of the model specifications
int num_t;
int popsize;
int contact;
array[num_t] real ts;

// true values for simulated data
real beta_sim;
real I0_raw_sim;

// fixed quantities
real incubation_period;
real infectious_period;

// control parameters
real rtol;
real atol;
int max_num_steps;

}

transformed data {

// generic transformed data block
int num_comp = 4; // total compartments
int num_prev = 3; // index of the I compartment
real t0 = 0.;

// fixed parameters
real tau = 1./incubation_period;
real gamma = 1./infectious_period;

// scale initial seed
real I0_sim = I0_raw_sim/popsize;

print(to_vector(append_array( {(1-I0_sim), 0, I0_sim}, rep_array(0, num_comp-3))));

// get reference simulation
gfull,
to_vector(append_array( {(1-I0_sim), 0, I0_sim}, rep_array(0, num_comp-3))), // initial values work well
t0,                               // initial time = 0
ts,                               // evaluation times
rtol, atol, max_num_steps,        // tolerances
tau, gamma, beta_sim, contact,              // data
);

print("initial values log transformed:");
print(to_vector(append_array( {log(1-I0_sim), -11, log(I0_sim)}, rep_array(-11, num_comp-3))));

array[num_t] vector[num_comp] y_sim_log = ode_rk45_tol(
gfull_logtrans,
to_vector(append_array( {log(1-I0_sim), -11, log(I0_sim)}, rep_array(-11, num_comp-3))), // values more negative then -10 lead to Nan result for y
t0,                               // initial time = 0
ts,                               // evaluation times
rtol, atol, max_num_steps,        // tolerances
I0_sim, tau, gamma, beta_sim, contact,    // data
);

print("initial values log transformed population scaled:");
print(to_vector(append_array( {log(popsize-I0_sim*popsize), log(0.1), log(I0_sim*popsize)}, rep_array(log(0.1), num_comp-3))));

array[num_t] vector[num_comp] y_sim_poplog = ode_rk45_tol(
gfull_pop,
to_vector(append_array( {log(popsize-I0_sim*popsize), log(0.1), log(I0_sim*popsize)}, rep_array(log(0.1), num_comp-3))), // values smaller than log(1) lead to Nan result for y
t0,                               // initial time = 0
ts,                               // evaluation times
rtol, atol, max_num_steps,        // tolerances
I0_sim, tau, gamma, beta_sim, contact,    // data
);

}

parameters {

}

transformed parameters {

}

generated quantities {

// posterior predictive check
array[num_t] vector[num_comp] y_log = y_sim_log;
array[num_t] vector[num_comp] y_poplog = y_sim_poplog;

}



With these initial conditions, the output is not exactly the same for the three models:

Could anyone explain me why there is an issue with the logtransformed model in Stan but not in R? Is there something about how rounding is done in Stan that causes this issue? When I print the ODE output for every step, I see that the E and R compartment reach very high values in only a few time steps and result in inf values and nan in a next step. Is this due to the division by exp(E) and exp(R)?

I would be very keen on getting a better understanding of Stan and how the ODE solver in Stan is different from the one in R. Any pointers would be greatly appreciated.

Best regards,
Judith

This is the code I used to run and analyse the stan models:

# Load stan file that contains these functions with same parameters as used above.
# define parameters for the spline
fixed_pars = list(incubation_period = (5/7),
infectious_period=(2/7),
contact=7*11)

simul_pars = list(I0_raw_sim=1,
beta_sim=0.08,
ts=1:45,
num_t=45,
popsize=10000)

data_list = c(fixed_pars,simul_pars)

data_list = within(data_list,{
rtol = 1e-5;
atol = 1e-5;
max_num_steps = 1000})

# compile
mod = cmdstan_model(stan_file = "/various_tests/MRE_logtransform.stan")

samples_prior = mod$sample( data = data_list, seed = 123, chains = 1, parallel_chains = 1, iter_warmup = 1, iter_sampling = 1, fixed_param=TRUE) y_traditional <- samples_prior$summary(c("y_traditional"))
y_log <- samples_prior$summary(c("y_log")) y_poplog <- samples_prior$summary(c("y_poplog"))
y_anthony <- samples_prior$summary(c("y_anthony")) y_traditional_forPlot <- y_traditional[91:135,] y_traditional_forPlot$t <- 1:45

y_log_forPlot <- y_log[91:135,]
y_log_forPlot$t <- 1:45 y_poplog_forPlot <- y_poplog[91:135,] y_poplog_forPlot$t <- 1:45

y_anthony_forPlot <- y_anthony[91:135,]
y_anthony_forPlot$t <- 1:45 # compare results of the three models ggplot() + geom_line(aes(x=t, y=mean, colour="Traditional model"), data = y_traditional_forPlot) + geom_line(aes(x=t, y= exp(mean), colour="Logtransformed model" ), data = y_log_forPlot) + geom_line(aes(x=t, y= exp(mean)/10000, colour="Logtransformed population scaled"), data = y_poplog_forPlot) + geom_line(aes(x=t, y= exp(mean), colour="Anthony special"), data = y_anthony_forPlot) + xlab("time (weeks)")+ylab("Number of infected individuals") + labs(title="Results stan model") + scale_colour_manual(values=c("Traditional model"="red", "Logtransformed model"="blue", "Logtransformed population scaled"="violet", "Anthony special"="green")) + theme(legend.position = "bottom")  1 Like UPDATE: I just had a chat with a computer scientist that said it is likely that the ODE solver in Stan has 32-bit precision and the one in R 64-bit, causing the observed discrepancy. So, I guess this is something that is not easily changed and I will have to find a work-around instead. Iâ€™m quite certain that this is not the case. AFAIK, some python ML frameworks use 32 bit numbers, but AFAIK for Stan everything is double, i.e. 64-bit. I think the reason for the discrepancy must be, that the ODE adaptation methods in Stan and R differ, and one of them (probably Stanâ€™s) doesnâ€™t adapt the step size â€ścorrectlyâ€ť for the log-transformed ODE. You should try passing a very low atol/reltol to Stan and see what happens. That being said, Iâ€™m surprised that it should lead to such a â€śbigâ€ť difference. Also, FYI, I also tried log-transforming ODEs once (for the same reason as you), and I think it was generally slightly less efficient and sometimes things went a bit horribly wrong. But if that ensures positivity of the states, that might be worth the trade-off in the context of HMC/NUTS. I didnâ€™t run any experiments where I actually sampled from anything, so Iâ€™d be interested to hear how this works out for you. 1 Like Thanks so much for your reply @Niko! Decreasing the the atol and reltol from 1e-5 to 1e-8 unfortunately did not change the output. With the â€śwrongâ€ť, relatively high initial values for E and R, the logtranformed ODE system is sligthly faster than the traditional model (88 minutes against 91), but the difference is slow. I am a bit in doubt if the log transmission is worth the inaccuracy introduced by the initial value issue. I might opt out for artificially bouning the output of the ODE to zero instead. I will keep you updated if I find a more elegant solution. Huh. How strange! Rather than exponentiating each y, would you have more stability by working on the log-scale? For example:  // ODE for logtransformed SEIR vector gfull_logtrans(real t, vector y, real I0, data real tau, data real gamma, real beta, data real contact, data int num_comp) { // define compartments vector[num_comp] dydt; real log_beta = log(beta); real log_contact = log(contact); real log_tau = log(tau); real log_gamma = log(gamma); dydt[1] = -exp(log_beta + log_contact + y[3]); dydt[2] = exp((log_beta + log_contact + y[3] + y[1]) - y[2]) - tau; dydt[3] = exp(log_tau + y[2] - y[3]) - gamma; dydt[4] = exp(log_gamma + y[3] - y[4]); return dydt; }  2 Likes Thank you @andrjohns for your great suggestion. I was very hopeful about your suggestion, because it removes the slightly awkward division by a very small number. However, the issue persisted and I cannot start with initial values lower than -11 for your rewritten version of the model neither. The standard â€śtrickâ€ť I used in the past is to run the ODE integrator with an absolute error of 1E-6 and add to the output of the integrator 1E-4 (to all outputs). This assumes that 1E-4 is basically like â€ś0â€ť in your problem. With this approach the solution will always stay positive and things work practically just fineâ€¦ itâ€™s a hack, but it gets things done usuallyâ€¦ obviously you need to adjust the above constants to your problemâ€¦ 1 Like Your modelâ€™s initial condition are not really matched. For example for the first model, you have to_vector(append_array( {(1-I0_sim), 0, I0_sim}, rep_array(0, num_comp-3))),  as init condition. For the second model I used log(to_vector(append_array( {(1-I0_sim), 1e-20, I0_sim}, rep_array(1e-20, num_comp-3))))  Despite that you mentioned only e^-11 work model can clearly run without issue with data_list = within(data_list,{ rtol = 1e-12; atol = 1e-12; max_num_steps = 100000})  and Iâ€™m getting matched results: > y_traditional_forPlot$mean - exp(y_log_forPlot\$mean)
[1]  1.754682e-10  7.586444e-11  6.712295e-10  1.998728e-09  4.190562e-10
[6]  1.187747e-09 -9.890192e-09  7.376365e-09 -4.386663e-09  6.027541e-08
[11]  9.395195e-08  8.826598e-08 -3.192914e-09  3.769290e-08 -2.005211e-08
[16]  2.060244e-08 -3.902472e-09 -1.983877e-08 -3.847148e-09 -4.374427e-09
[21] -3.197394e-09  1.566560e-10  5.822743e-10 -2.498680e-10 -2.001047e-10
[26]  1.701293e-10  7.118731e-10 -5.896439e-11 -1.113873e-10  1.864346e-10
[31] -1.541373e-10 -7.941334e-11  4.548871e-11  2.159835e-11  8.275981e-12
[36]  2.056254e-12  1.310957e-13 -8.303139e-13 -8.653749e-13  1.145616e-12
[41]  6.858054e-13  5.120895e-13  3.425366e-13  2.185677e-13  1.372995e-13


I suspect the 3rd model has similar inconsistency.

Thank you @wds15!

Thank you @yizhang for taking the time to run my code.

I wish that this was indeed the issue. However, the issue still persists. The model runs without problem, but only produces the correct values for the â€śtraditionalâ€ť model that was not log transformed. In your graph only the red line shows, the others produces nan values.

I forgot to mention that you need to switch from rk45 to bdf solver (ode_bdf_tol). Sorry about that.

1 Like

@yizhang you are right! Thanks so much, this is very helpful!