Investigating divergent chain

Hi, I am working on fitting a time series data into outputs of an ODE systems. The model is as following:

functions {
real[] ode(real t, real[] x, real[] theta, real[] x_star, int[] x_i) {
real dx_dt[3];
dx_dt[1] = theta[1]*x[2] + theta[7]*x[3] - theta[2]*x[1]; // immune response
dx_dt[2] = -theta[3]*x[1] + theta[4]*x[2]; // pathogen in lung
dx_dt[3] = -theta[5]*x[1] + theta[6]*x[3]; // pathogen in the nose
return dx_dt;
}
}

data {
int nShedding; // num Shedding time points
int nNeutro; // num neotrophils/Igs time points
int nCFU; // num CFU time points

//time pointers
int dpiShedding[nShedding];
int dpiNeutro[nNeutro];
int dpiCFU[nCFU];
real CFUEndPoints[2];
int isFittingCFUEndPoints;

// data vectors
real neutroCounts[nNeutro];
real shedding[nShedding];
real CFU[nCFU, 2];

// ODE setup
real t0;
int odeEndpoint; // end point when the rabbit was sacrified
real tf[odeEndpoint-1]; // integration time, tf[1] = 0.0
}

parameters {
real<lower = 0> theta[7];
real<lower = 0> x0[3]; // ODE ICs
real<lower = 0> sigmaX; // error
real<lower = 0> sigmaY[2]; // error CFULung and Nose
}

transformed parameters {
real x[odeEndpoint,3];
real log_x[odeEndpoint,3];
x[1,] = x0;
x[2:odeEndpoint,] = integrate_ode_rk45(ode, x0, t0, tf, theta,
rep_array(0.0, 0), rep_array(0, 0),
1e-4, 1e-5, 1e3);
// make sure that ODE solutions are positive
for (k in 1:3){
for (i in 1:odeEndpoint){
if(x[i, k] <= 1e-15){
log_x[i,k] = 0.0;
} else {
log_x[i,k] = log(x[i,k]+1.0);
}
}
}
}

model {

// priors
theta[1] ~ normal(1, 0.5); // a_lung
theta[2] ~ normal(1.4, 0.5); // b
theta[3] ~ normal(0.5, 0.5); // c_Lung
theta[4] ~ normal(0.5, 0.5); // r_Lung
theta[5] ~ normal(1, 0.5); // c_Nose
theta[6] ~ normal(0.15, 0.15); // r_Nose
theta[7] ~ normal(0.15, 0.3); // a_Nose

sigmaX ~ lognormal(0, 1);
sigmaY ~ lognormal(0, 1);

x0[1] ~ lognormal(log(neutroCounts[1]), .5);
x0[2] ~ lognormal(CFU[1,1]+0.2, 0.5);
x0[3] ~ lognormal(CFU[1,2]+0.1, 0.5);

// posteriors
for (k in 1:nNeutro){
neutroCounts[k] ~ lognormal(log_x[dpiNeutro[k],1], sigmaX);
}

if(isFittingCFUEndPoints == 1){
// only fit end points
CFUEndPoints[1] ~ normal(log_x[dpiCFU[nCFU],2], sigmaY[1]);
CFUEndPoints[2] ~ normal(log_x[dpiCFU[nCFU],3], sigmaY[2]);
} else {
// fit both CFU Nose and Lung
for (j in 1:2){
for (i in 1:nCFU){
CFU[i,j] ~ normal(log_x[dpiCFU[i],j+1], sigmaY[j]);
}
}
}

}

generated quantities {
real x_rep[nNeutro, 3];
for (m in 1:nNeutro){
x_rep[m,1] = lognormal_rng(log_x[dpiNeutro[m],1], sigmaX);
x_rep[m,2] = normal_rng(log_x[dpiNeutro[m],2], sigmaY[1]);
x_rep[m,3] = normal_rng(log_x[dpiNeutro[m],3], sigmaY[2]);
}
}

the time series are not very long, each has about 18-10 points. I set the number of iter = 15000 and warm-up = 10000. My question is, why is there such a big difference between the calculation between different chains? Some chains took forever to finish, I did some post-sampling investigation and found out that some chains have 20 divergences and some have 0. Can you suggest a way to investigate what happened to that particular chain that caused so many divergences? What is the best way to overcome this? Thank you :)

SAMPLING FOR MODEL ‘SingleRabbitSheddingNeutrophilsVer2B’ NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.000642 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 6.42 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 20000 [ 0%] (Warmup)

SAMPLING FOR MODEL ‘SingleRabbitSheddingNeutrophilsVer2B’ NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 0.000898 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 8.98 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 20000 [ 0%] (Warmup)

SAMPLING FOR MODEL ‘SingleRabbitSheddingNeutrophilsVer2B’ NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 0.00249 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 24.9 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 20000 [ 0%] (Warmup)

SAMPLING FOR MODEL ‘SingleRabbitSheddingNeutrophilsVer2B’ NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 0.001521 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 15.21 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:

Unfortunately, there are many possibilities. There are some things that look suspicious in your code, for example:

// make sure that ODE solutions are positive
for (k in 1:3){
for (i in 1:odeEndpoint){
if(x[i, k] <= 1e-15){
log_x[i,k] = 0.0;
} else {
log_x[i,k] = log(x[i,k]+1.0);
}
}
}

This is problematic as assigning zero to log_x erases any gradient information and makes the posterior non-differentiable (as noted e.g. at https://mc-stan.org/docs/2_20/functions-reference/step-functions.html)

If you want to make sure the values are always positive, a good way might be to work on the log scale directly and transform with exp to strictly positive numbers.

There are also some general guidelines on understanding and debugging divergences, I wrote about those on my blog here: https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/
and here: https://www.martinmodrak.cz/2018/05/14/identifying-non-identifiability/

Also note that you can use triple backticks (```) for start and end of code blocks to make your post more readable.

Hope that helps!

1 Like