SIRD model with terrible chain mixing for starters

Hello all,

Really struggling with what may just be a non-identified model.

This is an SIRD model being applied to Brazil deaths in 2020. The data is included in the Stan model:

The shinystan output shows poorly mixed chains:

Gamma:

Beta:

DeathRate

Summary has bad Rhats as one might expect:

# A tibble: 5 × 10
  variable              mean     median         sd        mad         q5        q95  rhat ess_bulk ess_tail
  <chr>                <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl> <dbl>    <dbl>    <dbl>
1 beta               0.795      0.900     0.173      0.135       0.546      0.987    2.29     5.04     34.4
2 gamma              0.803      0.908     0.173      0.135       0.554      0.995    2.30     5.04     34.9
3 deathRate          0.00863    0.00870   0.000672   0.000797    0.00742    0.00940  1.27    11.8      53.9
4 sigma_deaths_sd    0.0857     0.0856    0.00301    0.00391     0.0821     0.0912   1.23    13.5     129. 
5 iDay1_est       2993.      2683.      797.       770.       2124.      4189.       1.76     6.08     32.4

Code to run and graph results:

library(cmdstanr)
library(ggplot2)
library(rstan)
library(tidyr)
library(shinystan)

set_cmdstan_path("/home/breck/.cmdstanr/cmdstan-2.27.0")
model <- cmdstan_model(here::here("stan", "postSIRD.stan"))
fit <- model$sample(data=list(),
                    parallel_chains = 4,
                    iter_warmup = 1000,
                    iter_sampling = 1000,
                    chains = 4,
                    seed = 4857)



fit$summary(variables = c('beta', 'gamma', 'deathRate', 
                                  'sigma_deaths_sd', 'iDay1_est'))

fit$cmdstan_diagnose()

rstanfit <- rstan::read_stan_csv(fit$output_files())

launch_shinystan(rstanfit)

And finally the relevant Stan code, data is included:

/*
Modified from https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html
*/

// include /home/breck/git/codatmo/dataGeneratingProcess1/stan/ode_solvers.stan

functions {

  vector sird(real t, vector y, real beta, real gamma, real deathRate, real N) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real D = y[4];
      vector[4] states;
      real dS_dt = -beta * I * S / N;
      real dI_dt =  beta * I * S / N - gamma * I ;
      real dR_dt =  gamma * I - deathRate * R;
      real dD_dt =  deathRate * R; 
      return to_vector({dS_dt, dI_dt, dR_dt, dD_dt});
  }
}

data {
  
}

transformed data {
  int<lower=1> n_days = 291;
  vector[n_days] deaths = 
  to_vector({0,1,3,7,11,18,25,34,47,59,77,93,115,139,166,202,244,305,365,445,496,
  569,691,826,959,1073,1141,1237,1350,1557,1760,1962,2173,2375,2491,2598,2772,
  2940,3349,3722,4074,4301,4606,5092,5534,5980,6438,6810,7073,7381,7972,8597,
  9273,10027,10695,11172,11659,12503,13281,14070,14983,15702,16201,16925,18073,
  19058,20109,21148,22196,22863,23590,24642,25705,26895,28032,28895,29367,30105,
  31473,32667,34152,35253,36054,36530,37393,38586,39824,41092,41935,42792,43426,
  44190,45522,46707,47946,49118,50100,50709,51444,52851,53955,55135,56128,57159,
  57742,58473,59791,60877,62136,63349,64410,64965,65651,66952,68126,69347,70646,
  71578,72195,73030,74324,75602,76902,78026,78870,79578,80346,81663,82959,84272,
  85437,86536,87117,87802,89037,90259,91461,92727,93668,94193,94794,96152,97519,
  98744,99830,100648,101226,102009,103209,104361,105635,106642,107364,107951,
  108747,110099,111263,112499,113551,114378,114834,115551,116760,117839,118824,
  119673,120570,120971,121618,122768,123972,124839,125688,126292,126736,127070,
  127584,128752,129726,130574,131341,131746,132204,133286,134248,135117,135945,
  136626,136977,137443,138237,139169,139964,140786,141508,141845,142238,143096,
  143964,144851,145504,146093,146451,146844,147654,148379,149114,149768,150302,
  150580,150774,151152,151884,152698,153341,153756,153991,154317,154965,155535,
  156041,156604,156991,157192,157526,158052,158556,159107,159680,159972,160175,
  160339,160628,161246,161849,162120,162348,162538,162724,162922,163496,164429,
  165005,165739,165879,166135,166814,167568,168218,168731,169088,169266,169621,
  170248,170870,171581,172140,172684,172917,173268,173953,174630,175393,176070,
  176718,177057,177482,178280,179132,179902,180562,181241,181536,182049,182983,
  183924,184985,185802,186461,186879,187441,188410,189375,190135,190617,190913,
  191250,191788,192839,194056,195072});
  int Npop = 214110287;
  int<lower = 0, upper = 1> compute_likelihood = 1;
  int<lower = 0, upper = 1> scale = 1;
  
  int n_compartments = 4;
  int sCompartment = 1;
  int iCompartment = 2;
  int rCompartment = 3;
  int dCompartment = 4;
  real ts[n_days];
  real meanDeaths = 0;
  real sdDeaths = 1;

  if (compute_likelihood == 1){
    if (scale == 1) {
      sdDeaths = sd(deaths);
      if (sdDeaths == 0) {
       reject("Standard deviation of zero for deaths");
      }
    }
  }
  ts[1] = 1.0;
  for (i in 2:n_days) {
      ts[i] = ts[i - 1] + 1;
  }
  if (compute_likelihood == 0) {
      print("Not running likelihood");
  }
}

parameters {
  real<lower = 0, upper = 1> gamma;
  real<lower=0, upper = 1> beta;
  real<lower=0, upper = 1> deathRate;
  real<lower=0> sigma_deaths_sd;
  real<lower=0> iDay1_est;
}

transformed parameters{
  vector[4] y0;
  y0[sCompartment] = Npop - iDay1_est;
  y0[iCompartment] = iDay1_est;
  y0[rCompartment] = 0;
  y0[dCompartment] = 0;  
  real t0 = 0.0;
  vector[4] y[n_days] = ode_rk45(sird, y0 , t0, ts, beta, gamma, deathRate, Npop);
}

model {
  beta ~ normal(0, 1);
  gamma ~ normal(0, 1);
  deathRate ~ normal(0, 1);
  sigma_deaths_sd ~ normal(0,1);
  iDay1_est ~ uniform(0,10000);
  if (compute_likelihood == 1) { 
    for (i in 1:n_days) {
      deaths[i]/sdDeaths ~ normal(y[i, dCompartment]/sdDeaths, sigma_deaths_sd);
    }
  }
}

generated quantities {
  real pred_deaths[n_days];
  vector[n_days] actual_deaths = deaths;
  real D[n_days] = y[, dCompartment];
  for (i in 1:n_days) {
    pred_deaths[i] = 
      normal_rng(y[i, dCompartment] / sdDeaths, sigma_deaths_sd) * sdDeaths;
  }
}
1 Like

Isn’t this supposed to be only:

real dR_dt =  gamma * I;

I think that the problem is in the ODE system. I think that the model is struggling to keep population N constant because the equations are allowing for N to vary. It is not a “stable” population system.

I think that a better system might be:

real dS_dt = -beta * I * S / N;
real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * R);
real dR_dt =  gamma * I;
real dD_dt =  deathRate * I; 

Source: Compartmental models in epidemiology - Wikipedia

EDIT: And this system makes more sense than the original onde.

The R and D compartment varies accordingly to the I compartment. It is a “fork” in the DAG:
image

1 Like

The “correct” system would read

real dS_dt = -beta * I * S / N;
real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * I);
real dR_dt =  gamma * I;
real dD_dt =  deathRate * I; 

The R(ecovered) don’t transitioned to D(ead).

1 Like

Yes, you are correct. It was a typo. I will edit. Thanks!

The chains are mixing better:



Rhat better:

  variable             mean    median         sd        mad        q5       q95  rhat ess_bulk ess_tail
  <chr>               <dbl>     <dbl>      <dbl>      <dbl>     <dbl>     <dbl> <dbl>    <dbl>    <dbl>
1 beta               0.980     0.981    0.0124     0.0132      0.957     0.997   1.11     31.4     178.
2 gamma              0.940     0.941    0.0119     0.0127      0.918     0.957   1.11     31.5     140.
3 deathRate          0.0199    0.0199   0.000779   0.000751    0.0185    0.0211  1.04    120.      326.
4 sigma_deaths_sd    0.0692    0.0688   0.00373    0.00290     0.0643    0.0744  1.02    325.      165.
5 iDay1_est       9505.     9494.     245.       270.       9118.     9926.      1.09     37.6     154.

I took the prediction of iDay1_est out, just set it to 9,500, that firmed things up, rhat is good:


 variable          mean median       sd      mad     q5    q95  rhat ess_bulk ess_tail
  <chr>            <dbl>  <dbl>    <dbl>    <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 beta            0.981  0.981  0.00689  0.00703  0.969  0.992   1.00     913.     803.
2 gamma           0.941  0.941  0.00664  0.00677  0.930  0.952   1.00     917.     808.
3 deathRate       0.0200 0.0199 0.000669 0.000659 0.0188 0.0211  1.01    1080.    1454.
4 sigma_deaths_sd 0.0687 0.0685 0.00295  0.00294  0.0641 0.0736  1.00    1386.    1672.

Mixing is better too:

The only issue I am seeing now, other than 4 divergent transitions, is max treedepth messages:

1785 of 4000 (45.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.

Model is now:

/*
Modified from https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html
*/

// include /home/breck/git/codatmo/dataGeneratingProcess1/stan/ode_solvers.stan

functions {

  vector sird_old(real t, vector y, real beta, real gamma, real deathRate, real N) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real D = y[4];
      vector[4] states;
      real dS_dt = -beta * I * S / N;
      real dI_dt =  beta * I * S / N - gamma * I ;
      real dR_dt =  gamma * I - deathRate * R;
      real dD_dt =  deathRate * R; 
      return to_vector({dS_dt, dI_dt, dR_dt, dD_dt});
  }
  
    vector sird(real t, vector y, real beta, real gamma, real deathRate, real N) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real D = y[4];
      vector[4] states;
      real dS_dt = -beta * I * S / N;
      real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * I);
      real dR_dt =  gamma * I;
      real dD_dt =  deathRate * I;  
      return to_vector({dS_dt, dI_dt, dR_dt, dD_dt});
  }
  
  
  
  
}

data {
  
}

transformed data {
  int<lower=1> n_days = 291;
  vector[n_days] deaths = 
  to_vector({0,1,3,7,11,18,25,34,47,59,77,93,115,139,166,202,244,305,365,445,496,
  569,691,826,959,1073,1141,1237,1350,1557,1760,1962,2173,2375,2491,2598,2772,
  2940,3349,3722,4074,4301,4606,5092,5534,5980,6438,6810,7073,7381,7972,8597,
  9273,10027,10695,11172,11659,12503,13281,14070,14983,15702,16201,16925,18073,
  19058,20109,21148,22196,22863,23590,24642,25705,26895,28032,28895,29367,30105,
  31473,32667,34152,35253,36054,36530,37393,38586,39824,41092,41935,42792,43426,
  44190,45522,46707,47946,49118,50100,50709,51444,52851,53955,55135,56128,57159,
  57742,58473,59791,60877,62136,63349,64410,64965,65651,66952,68126,69347,70646,
  71578,72195,73030,74324,75602,76902,78026,78870,79578,80346,81663,82959,84272,
  85437,86536,87117,87802,89037,90259,91461,92727,93668,94193,94794,96152,97519,
  98744,99830,100648,101226,102009,103209,104361,105635,106642,107364,107951,
  108747,110099,111263,112499,113551,114378,114834,115551,116760,117839,118824,
  119673,120570,120971,121618,122768,123972,124839,125688,126292,126736,127070,
  127584,128752,129726,130574,131341,131746,132204,133286,134248,135117,135945,
  136626,136977,137443,138237,139169,139964,140786,141508,141845,142238,143096,
  143964,144851,145504,146093,146451,146844,147654,148379,149114,149768,150302,
  150580,150774,151152,151884,152698,153341,153756,153991,154317,154965,155535,
  156041,156604,156991,157192,157526,158052,158556,159107,159680,159972,160175,
  160339,160628,161246,161849,162120,162348,162538,162724,162922,163496,164429,
  165005,165739,165879,166135,166814,167568,168218,168731,169088,169266,169621,
  170248,170870,171581,172140,172684,172917,173268,173953,174630,175393,176070,
  176718,177057,177482,178280,179132,179902,180562,181241,181536,182049,182983,
  183924,184985,185802,186461,186879,187441,188410,189375,190135,190617,190913,
  191250,191788,192839,194056,195072});
  int iDay1_est = 9500;
  int Npop = 214110287;
  int<lower = 0, upper = 1> compute_likelihood = 1;
  int<lower = 0, upper = 1> scale = 1;
  
  int n_compartments = 4;
  int sCompartment = 1;
  int iCompartment = 2;
  int rCompartment = 3;
  int dCompartment = 4;
  real ts[n_days];
  real meanDeaths = 0;
  real sdDeaths = 1;

  if (compute_likelihood == 1){
    if (scale == 1) {
      sdDeaths = sd(deaths);
      if (sdDeaths == 0) {
       reject("Standard deviation of zero for deaths");
      }
    }
  }
  ts[1] = 1.0;
  for (i in 2:n_days) {
      ts[i] = ts[i - 1] + 1;
  }
  if (compute_likelihood == 0) {
      print("Not running likelihood");
  }
}

parameters {
  real<lower = 0, upper = 1> gamma;
  real<lower=0, upper = 1> beta;
  real<lower=0, upper = 1> deathRate;
  real<lower=0> sigma_deaths_sd;
  //real<lower=0> iDay1_est;
}

transformed parameters{
  vector[4] y0;
  y0[sCompartment] = Npop - iDay1_est;
  y0[iCompartment] = iDay1_est;
  y0[rCompartment] = 0;
  y0[dCompartment] = 0;  
  real t0 = 0.0;
  vector[4] y[n_days] = ode_rk45(sird, y0 , t0, ts, beta, gamma, deathRate, Npop);
}

model {
  beta ~ normal(0, 1);
  gamma ~ normal(0, 1);
  deathRate ~ normal(0, 1);
  sigma_deaths_sd ~ normal(0,1);
  //iDay1_est ~ uniform(0,10000);
  if (compute_likelihood == 1) { 
    for (i in 1:n_days) {
      deaths[i]/sdDeaths ~ normal(y[i, dCompartment]/sdDeaths, sigma_deaths_sd);
    }
  }
}

generated quantities {
  real pred_deaths[n_days];
  vector[n_days] actual_deaths = deaths;
  real D[n_days] = y[, dCompartment];
  for (i in 1:n_days) {
    pred_deaths[i] = 
      normal_rng(y[i, dCompartment] / sdDeaths, sigma_deaths_sd) * sdDeaths;
  }
}

Breck, I think the ODE system is still not ideal. It should be:

real dS_dt = -beta * I * S / N;
real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * I);
real dR_dt =  gamma * I;
real dD_dt =  deathRate * I;

That is the ODE system being run, I changed the function name to sird_old, your code is in the function name sird. Sorry, should have deleted the old code to be clear.

2 Likes

In the interest of posting a complete solution, I took @storopoli and @Niko suggested changes to the ODE, shown in sird(), my initial approach in sird_old(), set iDay1_est=0 and got good mixing and rhats. Previously I was trying to estimate iDay1_est as well.

Complete Stan code is below, calling R code above in the thread:

Thanks for the help.

/*
Modified from https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html
*/

functions {

  vector sird_old(real t, vector y, real beta, real gamma, real deathRate, real N) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real D = y[4];
      vector[4] states;
      real dS_dt = -beta * I * S / N;
      real dI_dt =  beta * I * S / N - gamma * I ;
      real dR_dt =  gamma * I - deathRate * R;
      real dD_dt =  deathRate * R; 
      return to_vector({dS_dt, dI_dt, dR_dt, dD_dt});
  }
  
    vector sird(real t, vector y, real beta, real gamma, real deathRate, real N) {

      real S = y[1];
      real I = y[2];
      real R = y[3];
      real D = y[4];
      vector[4] states;
      real dS_dt = -beta * I * S / N;
      real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * I);
      real dR_dt =  gamma * I;
      real dD_dt =  deathRate * I;  
      return to_vector({dS_dt, dI_dt, dR_dt, dD_dt});
  }
  
  
  
  
}

data {
  
}

transformed data {
  int<lower=1> n_days = 291;
  vector[n_days] deaths = 
  to_vector({0,1,3,7,11,18,25,34,47,59,77,93,115,139,166,202,244,305,365,445,496,
  569,691,826,959,1073,1141,1237,1350,1557,1760,1962,2173,2375,2491,2598,2772,
  2940,3349,3722,4074,4301,4606,5092,5534,5980,6438,6810,7073,7381,7972,8597,
  9273,10027,10695,11172,11659,12503,13281,14070,14983,15702,16201,16925,18073,
  19058,20109,21148,22196,22863,23590,24642,25705,26895,28032,28895,29367,30105,
  31473,32667,34152,35253,36054,36530,37393,38586,39824,41092,41935,42792,43426,
  44190,45522,46707,47946,49118,50100,50709,51444,52851,53955,55135,56128,57159,
  57742,58473,59791,60877,62136,63349,64410,64965,65651,66952,68126,69347,70646,
  71578,72195,73030,74324,75602,76902,78026,78870,79578,80346,81663,82959,84272,
  85437,86536,87117,87802,89037,90259,91461,92727,93668,94193,94794,96152,97519,
  98744,99830,100648,101226,102009,103209,104361,105635,106642,107364,107951,
  108747,110099,111263,112499,113551,114378,114834,115551,116760,117839,118824,
  119673,120570,120971,121618,122768,123972,124839,125688,126292,126736,127070,
  127584,128752,129726,130574,131341,131746,132204,133286,134248,135117,135945,
  136626,136977,137443,138237,139169,139964,140786,141508,141845,142238,143096,
  143964,144851,145504,146093,146451,146844,147654,148379,149114,149768,150302,
  150580,150774,151152,151884,152698,153341,153756,153991,154317,154965,155535,
  156041,156604,156991,157192,157526,158052,158556,159107,159680,159972,160175,
  160339,160628,161246,161849,162120,162348,162538,162724,162922,163496,164429,
  165005,165739,165879,166135,166814,167568,168218,168731,169088,169266,169621,
  170248,170870,171581,172140,172684,172917,173268,173953,174630,175393,176070,
  176718,177057,177482,178280,179132,179902,180562,181241,181536,182049,182983,
  183924,184985,185802,186461,186879,187441,188410,189375,190135,190617,190913,
  191250,191788,192839,194056,195072});
  int iDay1_est = 0;
  int Npop = 214110287;
  int<lower = 0, upper = 1> compute_likelihood = 1;
  int<lower = 0, upper = 1> scale = 1;
  
  int n_compartments = 4;
  int sCompartment = 1;
  int iCompartment = 2;
  int rCompartment = 3;
  int dCompartment = 4;
  real ts[n_days];
  real meanDeaths = 0;
  real sdDeaths = 1;

  if (compute_likelihood == 1){
    if (scale == 1) {
      sdDeaths = sd(deaths);
      if (sdDeaths == 0) {
       reject("Standard deviation of zero for deaths");
      }
    }
  }
  ts[1] = 1.0;
  for (i in 2:n_days) {
      ts[i] = ts[i - 1] + 1;
  }
  if (compute_likelihood == 0) {
      print("Not running likelihood");
  }
}

parameters {
  real<lower = 0, upper = 1> gamma;
  real<lower=0, upper = 1> beta;
  real<lower=0, upper = 1> deathRate;
  real<lower=0> sigma_deaths_sd;
  //real<lower=0> iDay1_est;
}

transformed parameters{
  vector[4] y0;
  y0[sCompartment] = Npop - iDay1_est;
  y0[iCompartment] = iDay1_est;
  y0[rCompartment] = 0;
  y0[dCompartment] = 0;  
  real t0 = 0.0;
  vector[4] y[n_days] = ode_rk45(sird, y0 , t0, ts, beta, gamma, deathRate, Npop);
}

model {
  beta ~ normal(0, 1);
  gamma ~ normal(0, 1);
  deathRate ~ normal(0, 1);
  sigma_deaths_sd ~ normal(0,1);
  //iDay1_est ~ uniform(0,10000);
  if (compute_likelihood == 1) { 
    for (i in 1:n_days) {
      deaths[i]/sdDeaths ~ normal(y[i, dCompartment]/sdDeaths, sigma_deaths_sd);
    }
  }
}

generated quantities {
  real pred_deaths[n_days];
  vector[n_days] actual_deaths = deaths;
  real D[n_days] = y[, dCompartment];
  for (i in 1:n_days) {
    pred_deaths[i] = 
      normal_rng(y[i, dCompartment] / sdDeaths, sigma_deaths_sd) * sdDeaths;
  }
}
2 Likes

In the system

      real dS_dt = -beta * I * S / N;
      real dI_dt =  (beta * I * S / N) - (gamma * I) - (deathRate * I);
      real dR_dt =  gamma * I;
      real dD_dt =  deathRate * I;  

none of the other compartments depend on R, and your likelihood doesn’t seem to depend on the number of recovered either, so you could remove the R dimension from the ODE system. This would reduce ODE dimension from 4 → 3 and remove some unnecessary computation.

If the initial state of the system is a parameter, the default Stan initialization can easily start from bad and unreasonable starting points from which it is really difficult to get to the good region. But for this kind of model and data you can often fix it by setting init for the initial state equal to the measured state at first time point or close to it. So not fixing the parameter value but just the sampling initialization.

2 Likes

While this

real<lower=0> iDay1_est;
iDay1_est ~ uniform(0,10000);

is mathematically equivalent to

real<lower=0, 10000> iDay1_est;,

I’m pretty sure the latter is computationally a lot better. I don’t think there is any case where you should use uniform.

2 Likes

If you

then

  y0[sCompartment] = Npop - iDay1_est = Npop
  y0[iCompartment] = iDay1_est = 0
  y0[rCompartment] = 0
  y0[dCompartment] = 0

and the derivatives are initially

dS_dt = -beta * I * S / N = 0
dI_dt =  beta * I * S / N - gamma * I  = 0
dR_dt =  gamma * I - deathRate * R = 0
dD_dt =  deathRate * R = 0

So the system will always stay in the initial state no matter what the parameter values are.

1 Like

Well that is embarassing, this is what I get for being focused on the mixing and not the poster pred check as well.

I really need a unit testing framework for experiments. The diagnostics all come back looking great! Just not doing any actual work…

Thanks for pointing it out. Back to getting things to work.

Works fine with iDay1_est = 9500; as per above.

But not with iDay1_est = 1;

There is no particular reason that 9500 is any worse an initial value than 1 other than convention. 9500 came from another model that estimated the value but it had other issues around diagnostics.

I’ll think on this some and try some of your other suggestions @jtimonen. Thanks for taking the time.

I prefer to keep priors explicit in my models so I tend to call out the uniform even though it is always there. I’ll have to look into the efficiency someday but I don’t think it matters since such models tend to be toys anyway. Really the compiler ought to deal with it if it doesn’t already.

It seems to be a bit of a freebie in the ODE, still have to compute gamma, but I ran it that way and it looks fine.

The 2020 Brazil I(nfected) numbers are very suspect so not adding that info. I came up with 9500 with another model that was estimating the value but had other issues. I’d still like to estimate it, have tried a bunch of approaches but the fits suffer. The latest effort was `iDay1_est ~ exponential(.0001)’

image

but that had diagnostic issues as well.

Even with the explicit prior in there, it is still a good idea to also specify the upper bound in the parameter declaration. Otherwise, the boundary of the uniform prior will represent a sharp discontinuity in the target (the target is some finite value on one side and is negative infinity on the other), and if you ever run into this boundary while sampling you’ll get a divergence.

1 Like

I’ll give it a go with the upper bound to see if that helps.

Is it silly to be trying to estimate the (I)nfected for day0?

I was also thinking of experimenting with varied lengths of 0 deaths days before the first death to not have a big jump on day1 that is very different from the change for any other pair of adjacent days. So start with I = 1, but the deaths would be some version of [0, 0, 0…, 1, ]. I’d have to vary the number of days before death outside of Stan I guess.

As @jsocolar said the explicit uniform(a,b) is not a problem but if you have it then you should always have the parameter bounded with <lower=a,upper=b>. You can see it quickly with

code1 <- "parameters{ real<lower=0,upper=100> theta;}"
code2 <- "parameters{real<lower=0> theta;} \nmodel{theta ~ uniform(0,100);}"
mod1 <- cmdstanr::cmdstan_model(write_stan_file(code1))
mod2 <- cmdstanr::cmdstan_model(write_stan_file(code2))

mod1$sample()  # works well
mod2$sample()  # a lot of divergences

It is because the bounds determine the transformation from constrained to unconstrained space (where sampling actually happens). In mod2, the unnconstrained space is actually constrained and it produces a whole lot of problems.

EDIT: Those “divergences” in mod2 are not divergences in the canonical sense of the word but, they just are marked as divergences as said here:

2 Likes

If you really have no idea what the number of infected is at initial time point, and it could be anything between 0 and population size, then it sounds like the data is just very difficult to fit anything to begin with. It could help if I saw the data. It in general is not silly in my opinion to try and estimate the initial state and it works if the data is well behaved with respect to the model, but it might not be the case here.

1 Like

FWIW, these are divergences in the sense that the simplectic integrator steps across the boundary in a single leapfrog step and immediately accumulates infinite numerical error in the trajectory. Because the discontinuity is abrupt, there is no step-size small enough to avoid the problem. Formally, this (and not the fact that the gradient is undefined right on the boundary) is why Stan cannot handle discontinuous likelihoods in general. And it’s also why in practice Stan can handle discontinuous likelihoods when the discontinuity is sufficiently small (i.e. the discontinuity is small enough so as not to cause too much error in the numerical integration). But this discontinuity is huge (infinite).

3 Likes

Good explanation. In my mind I have made a distinction between two cases

  1. error in Hamiltonian is finite but too large → divergence
  2. error in Hamiltonian is infinite or cannot be evaluated → marked as divergence

But I guess only the “cannot be evaluated” option is conceptually different.

1 Like