Sampling function taking too long to run at Chain 4

Hi all,

I have been running this Stan code below. In the beginning, I tried to run it with fewer iterations and just one chain, and the model was giving appropriate results but I could see in the mcmc graphs that it did not converge. I have been running then the same model with 4 chains at 3000 iterations with 500 warm-up period, but it has been running for 2 days now, and chain 4 seems the slower one. O I am wondering if there may be something wrong with the Stan or R code so that the sampling is taking so long?

Here is the R script code:
nobs = nrow(data14NOMISSING)
t0 = 0.0
zAB0 ← data14NOMISSING$AB0
ts ← data14NOMISSING$Time
indivs = length(unique(data14NOMISSING$ID))
zASC0 ← rnorm(nobs, 654, 0.5)
antibodies = data14NOMISSING$AB
subj = data14NOMISSING$ID
zinitials = cbind(zAB0, zASC0)
datalist2 ← list(nobs = nobs, t0 = t0, y0 = zinitials,
ts = ts, indivs = indivs, antib = antibodies, subj = subj)
init = function(){ list(pAB0= -2, uAB = 0.053, uASC0 = -2, AB0 = 7, ASC0 = 3, logsigma = 0, logsigmapAB = -3, logsigmauASC = -2, logsigmaAB0 = -1,
logsigmaASC0 = -1, rpAB = rnorm(nobs, 0, 2.5), ruASC = rnorm(nobs, 0, 2.5),
rAB0 = rnorm(nobs, 0, 2.5), rASC0 = rnorm(nobs, 0, 2.5))
}
Modelnosorted ← stan_model(“C:/Users/IGarciaFogeda/Documents/R/Rstan/HIERARCHICAL/Longitudinal/longitudinalmodel.stan”)
fit_modelnosorted<- sampling(Modelnosorted, data = datalist2, init = init,
iter = 3000, chains = 4, warmup = 500, seed = 0, control = list (stepsize = 0.1))

functions {
  vector model3(real t, vector y, real pAB, real uAB, real uASC) {
    vector[2] dydt;  
    dydt[1] = pAB*y[2]-uAB*y[1];
    dydt[2] = -uASC*y[2];
    
   return dydt;
  }
}
data {
  int <lower=1> nobs;
  real t0;
  vector[2] y0[nobs];
  real ts[nobs];
  int <lower=1> indivs;
  real <lower=0> antib[nobs];
  int <lower=1> subj[nobs];
}
parameters {
  real pAB0;
  real <lower=0, upper=1> uAB;
  real uASC0;
  real AB0;
  real ASC0;
  real logsigma;
  real logsigmapAB;
  real logsigmauASC;
  real logsigmaAB0;
  real logsigmaASC0;
  real rpAB[nobs];
  real ruASC[nobs];
  real rAB0[nobs];
  real rASC0[nobs];
}
model {
  vector[2] yhat[nobs];
  vector[2] yhatmu[nobs];
  real eval_time[1];
  real sigma = exp(logsigma);
  real sigmapAB = exp(logsigmapAB);
  real sigmauASC = exp(logsigmauASC);
  real sigmaAB0 = exp(logsigmaAB0);
  real sigmaASC0 = exp(logsigmaASC0);
  //prior distributions
  pAB0 ~ normal(0, 2.5);
  uASC0 ~ normal(0, 2.5);
  uAB ~ normal(0, 2.5);
  AB0 ~ uniform(5, 10);
  ASC0 ~ uniform(2,4);
  logsigmapAB ~ normal(0, 2.5);
  logsigmauASC ~ normal(0, 2.5);
  logsigmaAB0 ~ normal(0, 2.5);
  logsigmaASC0 ~ normal(0, 2.5);
  logsigma ~ normal(0, 2.5);
  for (j in 1:nobs) {
    real pAB = exp(pAB0)*exp(rpAB[subj[j]]);
    real uABt = uAB;
    real uASC = exp(uASC0)*exp(ruASC[subj[j]]);
    vector[2] zinitials;
    zinitials[1] = exp(AB0)*exp(rAB0[subj[j]]);
    //are we sure this is not yielding to negative values? is it hierarchical?
    zinitials[2] = exp(ASC0)*exp(rASC0[subj[j]]);
    // where mu of the r.e. is 0 or -sigmapAB/2
    rpAB[subj[j]] ~ normal(-sigmapAB/2, sigmapAB);
    ruASC[subj[j]] ~ normal(-sigmauASC/2, sigmauASC);
    rAB0[subj[j]] ~ normal(-sigmaAB0/2, sigmaAB0);
    rASC0[subj[j]] ~ normal(-sigmaASC0/2, sigmaASC0);
    eval_time[1] = ts[j];
    if (eval_time[1] == 0){
      yhatmu[j,1] = zinitials[1] - pow(sigma, 2.0)/2;
      yhatmu[j,2] = zinitials[2] - pow(sigma, 2.0)/2;}
    else{
      yhat = ode_rk45(model3, zinitials, t0, eval_time, pAB, uABt, uASC);
      yhatmu[j,2] = yhat[1,1] - pow(sigma, 2.0)/2;
    }  
  //likelihood
    antib[j] ~ lognormal(yhatmu[j,2], sigma); 
  }
}
generated quantities {
  real z_pred[nobs];
  real sigma2 = exp(logsigma);
  for (t in 1:nobs){
    z_pred[t] = lognormal_rng(antib[t], sigma2);
  }
}

I would recommend running more warmup iterations, or at least running warmup in multiple chains until all the chains converge to the same step size and roughly the same metric. You rarely want more sampling iterations than warmup iterations.

Are your initializations for all parameters reasonable? Sometimes what will happen is that a given chain will get stuck out in the tail in a region where the Hamiltonian dynamics become stiff or the nested ODE solver becomes stiff. The latter can sometimes be solved without careful initialization by plugging in a stiff solver, or at least reducing the size of the initialization interval (usually uniform(-2, 2) on the unconstrained scale, which can be really far out into the tail in high dimensions). I see you’re using normal(0, 2.5) for inits, but cutting down the variance there can help.

I think it’d probably be a bit clearer, though also a bit slower, to parameterize with positive-constrained sigma values and lognormal priors. The model doesn’t change, but it gets a lot shorter because you don’t need to define new exponentiated versions of the log scales.

I don’t think you need eval_time as a variable, as you can just replace it with ts[j] in the one conditional where it’s used. But the real question is why solve the ODE in every iteration for a different time rather than solving it once for all the times?

Given that the likelihood only involved yhatmu[, 2], why do you define yhatmu[j, 1]? Also, you can write pow(sigma, 2.0) as simply sigma^2 or slightly more efficiently as square(sigma).

I don’t think these lines are right:

for (j in 1:J)
  rpAB[subj[j]] ~ normal(-sigmapAB/2, sigmapAB);

It gives rpAB the same prior multiple time unless subj has no repeated entries. I think you just want this outside of the loop:

rpAB ~ normal(-sigmapAB/2, sigmapAB);

This applies the prior once for each subject.

exp(a) * exp(b) is much faster rewritten as exp(a + b). And no, you can’t generate negative values with exp(). You can generate 0 values and +infinity values from underflow and overflow, but not negative.

Everything is much faster vectorized. So it’s better to have a loop creating a J vector called E_antib or something and use that in antib ~ lognormal(E_antib, sigma); outside the loop.

Hi Bob,

Thanks for all the advice. Though, I do not understand what you mean by I am only using for my likelihood ymu[, 2], do you think there is some dimension there defined wrong?
Otherwise, how could I create an E_antib vector that does not have the 2, nobs dimensions, if I only define this as a dimension of 1, nobs?