Simple SIR model has converging issues for selection of initial values

Dear Stan-community,

I have converging issues for part of the initial value space of my simple SIR model. I fit a model using data simulated with the same model, to test model performance before working with real infectious disease data.

The stan model is defined below.

functions {
  vector seir_0d(real t, vector y, real beta, real I0, data real tau, data real gamma, data real contact, data array[] int DIM) {
    // array with all necessary dimensions
    int num_comp = DIM[1];
    // define compartments
    vector[num_comp] dydt;
    // define initial values
    vector[num_comp] init = to_vector(append_array({1-I0,0.0, I0}, rep_array(0.0, num_comp-3))); 
    // define time-varying force of infection (for now not time-varying)
    real finf_t = beta * contact * (y[3]+init[3]);
    // define ODEs
    dydt[1] = -finf_t * (y[1]+init[1]);
    dydt[2] =  finf_t * (y[1]+init[1]) - tau * (y[2]+init[2]);
    dydt[3] =  tau * (y[2]+init[2]) - gamma * (y[3]+init[3]);
    dydt[4] =  gamma * (y[3]+init[3]);
    return dydt;
  array[] real get_prevalence(array[] vector y, array[] int DIM, real popsize, real atol, real I0) {
    // dimensions
    int num_comp = DIM[1];
    int num_t = DIM[2];
    int num_prev = DIM[3];
    // extract, rescale and format prevalence
    array[num_t] real prev = to_array_1d( ( to_vector(y[,num_prev]) + I0 + 2*atol ) * popsize );
    return prev;
data {
  int num_t;
  array[num_t] real ts;
  real popsize;
  // true values for simulated data
  real beta_sim;
  real I0_raw_sim;
  // priors
  array[2] real p_I0; // expected initial seed (mean, sd)
  array[2] real p_beta; // expected beta (alpha, beta)
  // fixed quantities
  real incubation_period;
  real infectious_period;
  real contact;
  // control parameters
  real rtol;
  real atol;
  int max_num_steps;
  int inference;
  //data to fit
  array[num_t] int I_t_sim;
transformed data {
  // structure
  int num_comp = 4; // total compartments
  int num_prev = 3; // index of the I compartment
  array[3] int DIM = {num_comp,num_t,num_prev};
  real t0 = 0.;
  // fixed parameters
  real tau = 1./incubation_period;
  real gamma = 1./infectious_period;
  // scale initial seed
  real I0_sim = I0_raw_sim/popsize;
parameters {
  real<lower=0,upper=1> beta;           // transmission probability per contact
  real<lower=0> I0_raw;                 // initial seed (in number of individuals)
transformed parameters {
  real I0 = I0_raw/popsize;
  array[num_t] vector[num_comp] y = 
      rep_vector(0.0,num_comp),         // initial values = 0 (handled within the ODE)
      t0,                               // initial time = 0
      ts,                               // evaluation times
      rtol, atol, max_num_steps,        // tolerances 
      beta, I0,                         // parameters
      tau, gamma, contact,              // data
      DIM                               // metadata
  array[num_t] real<lower=0> prevalence = get_prevalence(y, DIM, popsize, atol, I0);
model {
  // priors
  beta ~ beta(p_beta[1],p_beta[2]);
  I0_raw ~ gamma(p_I0[1]^2/p_I0[2]^2,p_I0[1]/p_I0[2]^2); // reparameterization of gamma from mean/sd to alpha/beta
  // likelihood
    target += poisson_lpmf(I_t_sim | prevalence);
generated quantities {
  // log-likelihood for LOO
  vector[num_t] log_lik;
  for (t in 1:num_t)
    log_lik[t] = poisson_lpmf(I_t_sim[t] | prevalence[t]);
  // posterior predictive check
  array[num_t] int I_t_simulated = I_t_sim;
  array[num_t] int I_t_predicted = poisson_rng(prevalence);

The following code in R reproduces the issue. This code sometimes works, when the initial values by change are in the right range, but often ends with diverencence and a large rhat value.

# Reproducible code for stan forum


# simulated data
sim_data_mod1 <- c(5,1,1,3,1,4,4,3,2,6,8,17,8,16,13,24,47,63,55,62,60,97,82,107,87,

# set parameters
fixed = list(incubation_period=5/7,
prior = list(p_I0 = c(1,1),
                  p_beta = c(1,10))
simul = list(I0_raw_sim=1,

data_list = c(fixed,prior,simul)
data_list$ts = 1:simul$num_t
data_list$inference = 1
data_list = within(data_list,{
  rtol = 1e-6;
  atol = 1e-6;
  max_num_steps = 1000;
  I_t_sim = sim_data_mod1})

# load stan model 
mod = cmdstan_model(stan_file = paste0("stan/mod1_rk45.stan"))

# fit simulated parameters from simulated data
samples_posterior = mod$sample(
  data = data_list,
  chains = 4, parallel_chains = 4,
  iter_warmup = 200, iter_sampling = 1000)

To identify the range of initial values that causes issues, I ran a set of combinations of initial values. The figure below shows if the rhat is below (pink) or above(blue) 1.05 for the combination of the initial value of I0 (initial number of infected individuals) and beta (transmission rate).

The issue only appears for some combination of the parameters used to simulate the data. The convergences is better when I for instance only simulate the first half of the epidemic. However, I would feel more comfortable applying this model to real data, when I know it can infer any set of parameters.

I do not understand why my model cannot infer the simulated parameters, independent of the initial values. For this simple model, the likelihood should be relatively smooth. Could any of you give me a hint of how to handle this issue?

I would greatly appreciate any help.


What I have tried so far:

  • A stronger prior on small values of beta or a uniform prior. (Even though the initial value does not depend on the prior).
  • A traditional way of including the initial state of the ODE model.
  • Increasing adapt_delta.

Additional information:
I am running cmdstan version 2.30.1 and stan version 2.21.0. The initial conditions of the ODE model are included in the definition of the ODE itself to reduce the number of parameters.

Hi, @JudithBouman2412. I’m not an ODE model specialist, but you might have better luck swapping out the RK45 integrator with our stiff integrator (bdf). The problem with the non-stiff integrator is that it can get stuck during initialization if the system is stiff out in the tails. I didn’t know how to interpret your plot as R-hat values are usually >= 1.

You can sometimes get by sticking to the RK45 integrator if you provide better initializations than our random initializations. That can help you avoid stiffness in the ODE.

See: 13.5 Stiff ODEs | Stan User’s Guide

Also, I don’t know if you found the Stan case study on SIR models—that’ll probably also have some useful information:

It uses Covid data from Switzerland as a final worked example.


ODE models are rarely simple despite having few parameters, so estimating just two of the parameters is probably a very sane way of doing a simulation study/validation. Still, it is not surprising to me that you cannot get good convergence for any initial value, there are at least two problems I can think of from the get go:

  1. The likelihood surface of nonlinear models can be extremely complex and there may be regions where even advanced samplers get stuck;
  2. ODE solves may work well for some regions of parameters but produce nonsensical outputs for others due to the algorithm tolerances for state-space values.

As @Bob_Carpenter mentioned, switching integrators may improve things and help with issue #2, but I’d say only to some extent, since there are very likely regions where it will still fail to generate a meaningful output.

One test you can do is look at plots of the output for the initial values where convergence is poor, this may reveal failures of the ODE solver and a wrong mapping of the parameter space to the posterior values – which means that sampled solutions are not valid solutions. A sort of posterior predictive check.

Also, I didn’t look into it very closely, but it’s not clear to me why the dydt vector has a varying size, but its components are hard coded as 1-4, but as long as the hard coded values hold it shouldn’t pose a problem.

In sum, I think it makes more sense to check whether convergence happens under starting values where the output represent actual solutions to the ODE system, and verifying that these chains converge to the same region where “better” chains arrive.

1 Like

I suspect the cause is not ODE solver or tolerance selection. Take a look at the 1st & 2nd components of the ODE. The right hand side are essentially product of functions of the model parameters beta and I0_raw, e.g.


which could lead to multiplicative non-identifiability. This can be confirmed if the divergent runs show two modes, with one pair of (beta, I0_raw) close to the true value and the other pair somewhat flipped over.

If that’s the case, in addition to strong prior, another solution is to add init to the stan run:

fit <- mod$sample(data = data_list, parallel_chains = 4, init=function(){list(beta=0.2,I0_raw=0.8)})

For limited and very noisy data it may still be difficult to identify the combinations in practice, but not in principle. Since this product is not observed directly, it won’t be a strict case of product identifiability problem: different combinations may have the same product, but they will result in different trajectories of the I compartment.

Thank you all very much for taking the time to look at my issue and giving helpful suggestions.

Thank you for this suggestion. This is indeed true, depending on the parameters the solvers perform different. And the reference to the Stan case study was also helpful to better understand the potential issues with ODEs in stan.

This is also a very helpful suggestion.

This indeed helps to work around the issue, given that I check that the result is indeed sensible.

I now visualized the likelihood of the model and it is indeed more complex than I expected. I think that I will just have to keep this in mind when extending my model and applying it to real data.

Why are you adding initial values to the differential equations? I think you shouldn’t do that (if this is the most basic SEIR model). Also, since your model is normalised to population values, those additions have significant effects on the model dynamics.

1 Like

Thank you @jandraor for your suggestion.

As I mentioned in the opening post, I do this because it reduces the number of parameters I have to estimate, which will improve the efficiency of the model especially if I later on expand it. I have tested if it influences my issue by running a model without this trick including initial values in the clasical way and it does not affect the problem I experience.

To be complete about this issue, I just want to share my conclusions on analysing the log-likelihood of the model. I analysed the log-likelihood with a version of the same model implemented in R, to be able to sample the full parameter space.

The log-likelihood of the model looks like this:

And zoomed in on the high log-likelihood band:

The R0 for the model is the following for these combinations of parameters:

The parameter ranges that caused issues correspond with unreasonably high values of the reproduction number.

It is still not completely clear to me why the MCMC can get stuck and diverge for high initial values of beta. However, I concluded that if I start the chains with initial values in a reasonable range of the reproduction number (below 5, when the true value is 1.5), my model will converge. Thus, keeping this in mind will help me analysing more complex models and data.

1 Like