Model fitting and sampling issue: Only 1 chain sampling properly

MadelineJC · April 17, 2020, 9:42pm

Hi all,

I’ve been here a lot lately, so I’m very sorry about that!

Thanks to the community, I was able to get my model running, but noticed something strange in the output… So, I was running 4 chains with 2000 iterations. I noticed that chains 2, 3, and 4 were running in synchrony, and chain 1 was lagging behind. When the model finished running, I was told the following:

Warning message:
In writable_sample_file(diagnostic_file) :Warning message:
Warning message:
1: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

  "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj" is not writable; use "/var/folders/l_/xb7lfbls12nb0dwxdrww8gn40000gn/T//RtmpDSeoCR" instead
In writable_sample_file(diagnostic_file) :In writable_sample_file(diagnostic_file) :Warning message:

 In writable_sample_file(diagnostic_file) :2: There were 3963 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 

  "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj" is not writable; use "/var/folders/l_/xb7lfbls12nb0dwxdrww8gn40000gn/T//RtmpIJCP4C" instead
 "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj" is not writable; use "/var/folders/l_/xb7lfbls12nb0dwxdrww8gn40000gn/T//Rtmp7jEFf6" instead
3: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low 
4: Examine the pairs() plot to diagnose sampling problems
 

  "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj" is not writable; use "/var/folders/l_/xb7lfbls12nb0dwxdrww8gn40000gn/T//RtmphaWosk" instead
5: The largest R-hat is 5.43, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

And when I looked at the trace plots, I noticed that only the first chain seemed to be sampling properly:

This seems like a buggy problem rather than a conceptual problem, so I was wondering if anyone’s seen something like this before, and if anyone knows how to amend the situation.

Here’s my model (in rstan):

library(rstan)
library(gdata)
library(bayesplot)
library(tidyverse)
library(brms)

write("// Stan Model;
functions{
  real[] dZ_dt(
    real t, // Time
    real[] Z, // System state {Parasite, Host}
    real[] theta, // Parms
    real[] x_r, // Real data; below is integer data
    int[] x_i){
    real P = Z[1]; // System state coded as an array, such that Z = (P,H)
    real H = Z[2];

    real r = theta[1];  // Parameters of the system, in order they appear
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];

    real dP_dt = P*r - H*(O*P/1 + O*P*h); // Mechanistic model
    real dH_dt = b + H*(c*(O*P/1 + O*P*h)-u);
    return({dP_dt,dH_dt}); // Return the system state
  }
}

data{
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N
  int y_init[2]; // Initial conditions for ODE
  int<lower=0>y[N,2]; // Define y as real and non-negative
}

parameters{
  real<lower=0>r;
  real<lower=0,upper=1>O;
  real<lower=0>h;
  real<lower=0>b;
  real<lower=0>c;
  real<lower=0,upper=1>u;
  real<lower=0>Z_init[2]; // Initial population size non-neg
}

transformed parameters{
  // Stiff solver for ODE
  real Z[N,2]
  = integrate_ode_bdf(dZ_dt,Z_init,0,ts,{r, O, h, b, c, u},rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  for (i in 1:N) {
    Z[i,1] = Z[i,1] + 1e-6;
    Z[i,2] = Z[i,2] + 1e-6;
  }
}

model{ // Ignore the means/variances for now...
  r~normal(2.5,1); // r
  O~beta(2,2); // O; bounded between 0 and 1
  h~normal(0.06,1); // h
  b~normal(35,1); // b
  c~normal(0.2,1); // c
  u~beta(2,2); // u; bounded between 0 and 1
  Z_init~uniform(0,400);
  for (k in 1:2){
    y_init[k]~poisson(Z_init[k]);
    y[ ,k]~poisson(Z[ ,k]);
  }
}",
"Stan_Model_TypeII.stan")

stanc("Stan_Model_TypeII.stan") # To check that we wrote a file (I think we did?)

Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")

# Squeezing the data into a form that Stan gets
Stoch_Data_TypeII = read.csv('/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj_Data/Stoch_Data_TypeII.csv')
N <- length(Stoch_Data_TypeII$t)-1 # df is 2923; N is 2922
ts <- 1:N
y_init <- c(10,350) # Initial states, P = 10; H = 350
y <- as.matrix(Stoch_Data_TypeII[2:(N+1),3:4])
y <- cbind(y[ ,2],y[ ,1]); # This worked, sick; where y[,1] is H, and y[,2] is P
Stan_StochData_TypeII <- list(N=N,ts=ts,y_init=y_init,y=y)

# // Make non-neg real<lower=0>alpha[{1,3,4,5}];
# // Proportions bounded between 0 and 1 real<lower=0,upper=1>alpha[{2,6}]

# Fitting the data to the model
Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")
fit2 <- sampling(Stan_Model_TypeII,
                data = Stan_StochData_TypeII,
                warmup = 1000,
                iter = 2000,
                chains = 4,
                cores = 4,
                thin = 1,
                check_data = TRUE,
                diagnostic_file = "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj/TypeII_Fitting_Output2.R",
                seed = 124,
                show_messages = TRUE,
                verbose = TRUE)

#### Tracking progress ####

# 1st run (1 chain, 500 iter)
#### 806.348 seconds (Warm-up)--13.5 min
#### 2514.75 seconds (Sampling)--42 min
#### Rhat 1.71
#### 233 transitions after warmup that exceeded the maximum treedepth; increase max_treedepth above 10
#### Bulk Effective Samples Size (ESS) is too low
OutputSum_1 <- fit
fit1_plot <- plot(fit)
fit1_trace <- traceplot(fit)

# 2nd run (4 chains, 2000 iter)
#### Chain 1, 173 min warm/357 min samp; Chain 2, 92 min warm/230 min samp; Chain 3, 88 min warm/242 min samp; Chain 4, 87 min warm/253 min samp
#### Rhat 5.43
#### 8 divergent transitions after warmup; ncreasing adapt_delta above 0.8 may help
#### 3963 transitions after warmup that exceeded the maximum treedepth; increase max_treedepth above 10
#### Bulk Effective Samples Size (ESS) is too low; tail Effective Samples Size (ESS) is too low--more iterations!
#### 1 chains where the estimated Bayesian Fraction of Missing Information was low
fit2_plot <- plot(fit2)
fit2_trace <- traceplot(fit2)

I would appreciate any advice people have! Thank you all so much for putting up with me! :)

stijn · April 18, 2020, 8:42am

I’ve been here a lot lately, so I’m very sorry about that!

You shouldn’t feel sorry about that. I know next to nothing about ODEs. So, I’ll stick to very generic comments.

The traceplots suggest to me that the model is not well identified and the sampler gets stuck. In other words, I think this might still be a conceptual issue.
A quick look at the main part of the ODE shows that there are a lot of multiplications, partly between parameters. These type of multiplications are often hard to identify.

    real dP_dt = P*r - H*(O*P/1 + O*P*h); // Mechanistic model
    real dH_dt = b + H*(c*(O*P/1 + O*P*h)-u);

The only healthy looking chain is chain 1 for Z[1,1]. So, my take-away would be that there are too many parameters that determine the outcome of Z[1,1], so that Z[1,1] is potentially identified by the data but the parameters are not.

Your first step could be to simplify the underlying model even more by either fixing some parameters to one value or by decreasing the number parameters.

Good luck with the model and hopefully my response bumps your post up so that knowledgeable people can have a look at your model.

MadelineJC · April 19, 2020, 1:14am

Hi @stijn, thanks so much for your comment!

Unfortunately, I can’t simplify the model without losing its representation of the system…

It seems like this model may be sensitive to initial conditions, so I was thinking giving the model the initial state of the system, as opposed to having it estimate Z[1,1] and Z[1,2], could help, which kind of speaks to your third point.

Unfortunately, I’ve tried to implement that, and now, running two chains, the second chain runs all the way through, pretty quickly, and the first chain stagnates at 0%…

 functions{
  real[] dZ_dt(
    real t, // Time
    real[] Z, // System state {Parasite, Host}
    real[] theta, // Parms
    real[] x_r, // Real data; below is integer data
    int[] x_i){
    real P = Z[1]; // System state coded as an array, such that Z = (P,H)
    real H = Z[2];

    real r = theta[1];  // Parameters of the system, in order they appear
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];

    real dP_dt = P*r - H*(O*P/1 + O*P*h); // Mechanistic model
    real dH_dt = b + H*(c*(O*P/1 + O*P*h)-u);
    return({dP_dt,dH_dt}); // Return the system state
  }
}

data{
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N
  real y_init[2]; // Initial conditions for ODE
  int<lower=0>y[N,2]; // Define y as real and non-negative
}

parameters{
  real<lower=0>r;
  real<lower=0,upper=1>O;
  real<lower=0>h;
  real<lower=0>b;
  real<lower=0>c;
  real<lower=0,upper=1>u;
}

transformed parameters{
  // Stiff solver for ODE
  real Z[N,2]
  = integrate_ode_bdf(dZ_dt,y_init,0,ts,{r, O, h, b, c, u},rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  for (i in 1:N) {
    Z[i,1] = Z[i,1] + 1e-6;
    Z[i,2] = Z[i,2] + 1e-6;
  }
}

model{ // Ignore the means/variances for now...
  r~normal(2.5,1); // r
  O~beta(2,2); // O; bounded between 0 and 1
  h~normal(0.06,1); // h
  b~normal(35,1); // b
  c~normal(0.2,1); // c
  u~beta(2,2); // u; bounded between 0 and 1
  for (k in 1:2){
    y[ ,k]~poisson(Z[ ,k]);
  }
}

And the r script:

# Squeezing the data into a form that Stan gets
Stoch_Data_TypeII = read.csv('/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj_Data/Stoch_Data_TypeII.csv')
N <- length(Stoch_Data_TypeII$t)-1 # df is 2923; N is 2922
ts <- 1:N
y_init <- c(10,350) # Initial states, P = 10; H = 350
y <- as.matrix(Stoch_Data_TypeII[2:(N+1),3:4])
y <- cbind(y[ ,1],y[ ,2]); # This worked, sick; where y[,1] is P, and y[,2] is H
Stan_StochData_TypeII <- list(N=N,ts=ts,y_init=y_init,y=y)

# Fitting the data to the model
Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")
fit3 <- sampling(Stan_Model_TypeII,
                 data = Stan_StochData_TypeII,
                 warmup = 200,
                 iter = 400,
                 chains = 2,
                 cores = 2,
                 thin = 1,
                 # control = list(max_treedepth=15,
                                # adapt_delta=0.99),
                 check_data = TRUE,
                 diagnostic_file = "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/Fit3.csv",
                 show_messages = TRUE,
                 verbose = TRUE)

stijn · April 19, 2020, 4:23am

Sorry to hear that. Your problem is far outside my comfort zone. I am not sure I can help you.

If I am right about the identifiability (big if!), then that would mean that the system has too many free parameters, to be estimated from the data. One way to test this is to see whether you can recover data that is generated based on your priors and model.

nhuurre · April 19, 2020, 7:44am

This cannot be correct.

real dP_dt = P*r - H*(O*P/1 + O*P*h); // Mechanistic model
real dH_dt = b + H*(c*(O*P/1 + O*P*h)-u);

Remember the operator precedence rules. Division happens before addition. If the mechanistic model is meant to be \frac{dP}{dt} = Pr + H\times\frac{OP}{1+hOP} then the denominator must be enclosed in parenthesis. I would also combine common subexpressions.

real OP = O*P;
real OP_h = OP / (1 + h*OP);
real dP_dt = P*r - H*OP_h; // Mechanistic model
real dH_dt = b + H*(c*OP_h-u);

MadelineJC · April 19, 2020, 7:10pm

I’ll try that––thanks!

MadelineJC · April 19, 2020, 7:11pm

I think you could be right about this! I’m trying to feed the model the initial values rather than having it estimate them, but I think I’m messing something up with the code…

stijn · April 20, 2020, 3:54am

Can you give the mechanistic model in mathematical notation? That might help us check whether Niko is right.

MadelineJC · April 20, 2020, 4:03am

It should be as follows:

I’ve implemented some changes, including @nhuurre’s, and it looks like identifiability may be a big part of the issue; the model usually fits, but the chains don’t converge, and convergence gets worse as I add more iterations and more chains.

I’m also trying to add a generated quantities block, which is proving problematic. Do you have any advice?

Here’s watch I’m working with now:

write("// Stan Model;
functions{
  real[] dZ_dt(
    real t, // Time
    real[] Z, // System state {Parasite, Host}
    real[] theta, // Parms
    real[] x_r, // Real data; below is integer data
    int[] x_i){
    real P = Z[1]; // System state coded as an array, such that Z = (P,H)
    real H = Z[2];

    real r = theta[1];  // Parameters of the system, in order they appear
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];

    real FR_II = O*P/(1 + O*P*h);
    real dP_dt = P*r - H*FR_II; // Mechanistic model
    real dH_dt = b + H*(c*FR_II-u);
    return({dP_dt,dH_dt}); // Return the system state
  }
}

data{
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N
  real y0[2]; // Initial conditions for ODE
  real t0; // Initial time point
  int<lower=0>y[N,2]; // Define y as real and non-negative
}

parameters{
  real<lower=0>r;
  real<lower=0,upper=1>O;
  real<lower=0>h;
  real<lower=0>b;
  real<lower=0>c;
  real<lower=0,upper=1>u;
}

transformed parameters{
  // Stiff solver for ODE
  real Z[N,2]
  = integrate_ode_bdf(dZ_dt,y0,t0,ts,{r, O, h, b, c, u},rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  for (i in 1:N) {
    Z[i,1] = Z[i,1] + 1e-6;
    Z[i,2] = Z[i,2] + 1e-6;
  }
}

model{ // Ignore the means/variances for now...
  r~normal(2.5,1); // r
  O~beta(2,2); // O; bounded between 0 and 1
  h~normal(0.06,1); // h
  b~normal(35,1); // b
  c~normal(0.2,1); // c
  u~beta(2,2); // u; bounded between 0 and 1
  for (k in 1:2){
    y[ ,k]~poisson(Z[ ,k]);
  }
}",
"Stan_Model_TypeII.stan")

stanc("Stan_Model_TypeII.stan") # To check that we wrote a file

Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")

# Squeezing the data into a form that Stan gets
Stoch_Data_TypeII = read.csv('/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/data/Type II/FcnalRespProj_Data/Stoch_Data_TypeII.csv')
N <- length(Stoch_Data_TypeII$t)-1 # df is 2923; N is 2922
ts <- 1:N
y_init <- c(10,350) # Initial states, P = 10; H = 350
y0 <- array(y_init) # For the ODE solver
t0 <- 0.0
y <- as.matrix(Stoch_Data_TypeII[2:(N+1),3:4])
y <- cbind(y[ ,1],y[ ,2]); # This worked, sick; where y[,1] is P, and y[,2] is H
Stan_StochData_TypeII <- list(N=N,ts=ts,y0=y0,t0=t0,y=y)

# Fitting the data to the model
Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")
fit5 <- sampling(Stan_Model_TypeII,
                 data = Stan_StochData_TypeII,
                 warmup = 200,
                 iter = 400,
                 chains = 2,
                 cores = 2,
                 thin = 1,
                 control = list(max_treedepth=15,
                                adapt_delta=0.99),
                 seed = 123,
                 check_data = TRUE,
                 diagnostic_file = "/Users/mjarviscross/Desktop/GitHub/FcnalRespProj/Fit5.csv",
                 show_messages = TRUE,
                 verbose = TRUE)

And the generated quantities block I want to add looks like this:

generated quantities {
  int y_rep[N, 2];
  for (k in 1:2) {
    for (n in 1:N)
      y_rep[n, k] = poisson_rng(Z[ ,k]);
  }
}

But am getting the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Dimension mismatch in assignment; variable name = y_rep, type = int; right-hand side type = int[ ].
Illegal statement beginning with non-void expression parsed as
  y_rep[[n, k]]
Not a legal assignment, sampling, or function statement.  Note that
  * Assignment statements only allow variables (with optional indexes) on the left;
  * Sampling statements allow arbitrary value-denoting expressions on the left.
  * Functions used as statements must be declared to have void returns

stijn · April 20, 2020, 6:35am

This line probably needs to be

y_rep[n, k] = poisson_rng(Z[n,k]);

The syntax error message is saying that y_rep[n,k] is an integer while poisson_rng(Z[,k]) is an array of integers and those can not be assigned to each other.

As a sketch for how I think you can look for non-identifiable parameters see below. But this is probably not a useful way of thinking about it, I just write it down as how I tried to help but probably failed. You really want someone who actually knows how ODEs work. I just put my simplistic economist hat on and look at what would happen if the system stabilises. That is \frac{dP}{dt} = 0 and \frac{dH}{dt} = 0. If you rewrite everything a little bit you can then write H as H = (b - r) \times f(P, \theta, c, h, \mu) where f is not a function of b or r. That means that if(!) most of the data is from observations close to a stable system b and r are poorly identified but their difference is well defined. In my simpler models I would then reparametrise (b, r) to (\Delta = b - r, M = b/2 + r/2) which sometimes works.

MadelineJC · May 12, 2020, 2:10pm

That’s a really interesting idea! Thank you so much––I really appreciate all of your help! :)

LRmodel · July 17, 2021, 11:24pm

Hi @MadelineJC ,
I am experiencing a similar issue as you had. I am fitting an ODE model and I often get stuck in some chains, while other chains could run quickly. Could you please let me know how did you fix your problem? Thank you!

martinmodrak · July 18, 2021, 3:19pm

ODE models can exhibit all sorts of different pathologies, many of which can manifest as some chains getting stuck. Unless your ODE is very similar to the one in this thread, I would suggest to start a new topic where you would describe your ODEs in detail. New topics also tend to get more attention than revivals of old topics.

MadelineJC · September 20, 2022, 8:01pm

Hello! I’m just coming back to this project now, and I’m pretty sure that the problem was (1) an identifiability issue caused by two of my parameters being multiplicative, and/or (2) a difference between the data generating model, and the stochastic data I generated using the model + Gillespie algorithm. In other words, the deterministic system generated very different-looking data than the stochastic system, despite the same underlying model. I know this likely isn’t very useful, so I’m sorry for that, in addition to this very late response.

MadelineJC · September 20, 2022, 8:02pm

Hello! I’m just getting back into working on a version of this project again. This is good advice. I’ll work on crafting a more detailed topic.

Topic		Replies	Views
Rstan output warning Modeling	6	751	June 14, 2020
Divergent transitions and exceed max treedepth warning General rstan , fitting-issues	5	3946	August 17, 2020
Strange behavior with lognormal() sampling statement Modeling fitting-issues	15	815	May 20, 2020
Divergent transitions after warmup Modeling rstan , fitting-issues , performance	1	621	February 21, 2024
Variable running speeds across chains during warmup - possible causes Modeling	4	646	May 27, 2019

Model fitting and sampling issue: Only 1 chain sampling properly

Related topics