# One chain always slower (x10) than other chains

Hi,

I am new to Stan. I created the following Beta-Binomial model for daily samples, which assumes the success rate of the Binomial distribution is sampled from a Beta distribution each day, and the mean of the Beta distribution is estimated using a logistic regression. I am trying to use that to estimate the success rate for multiple binomial distributions (so the mapping variable in the Stan code). The problem I encountered is with 4 MCMC chains, I always have one chain that takes much more time to finish than the other chains (more than x10 times) wh. I am wonder if this is a model specification issue and any way to improve that.

``````data {
int<lower=1> L;
int<lower=0> ref_block[L, 2];

int<lower=0> N;
int<lower=0> M;

int<lower=0> n[N];
int<lower=0> y[N];
matrix[N, M] X;  // design matrix
int mapping[N];
}
transformed data {
int num_cell;
num_cell = max(mapping);
}
parameters {
// beta params
vector[M] beta;
vector<lower=0>[num_cell] phi;
real<lower=0> rho_sq[L];
real<lower=0, upper=1> theta[N];
}
transformed parameters {
vector[num_cell] mu; //the linear predictor
vector<lower=0>[num_cell] Alpha; //the first shape parameter for the beta distribution
vector<lower=0>[num_cell] Beta; //the second shape parameter for the beta distribution

for (nn in 1:N) {
mu[mapping[nn]] = inv_logit(X[nn, :] * beta); //using logit link
Alpha[mapping[nn]] = mu[mapping[nn]] * phi[mapping[nn]];
Beta[mapping[nn]] = (1 - mu[mapping[nn]]) * phi[mapping[nn]];
}
}
model {
for (nn in 1:N) {
y[nn] ~ beta_binomial(n[nn], Alpha[mapping[nn]], Beta[mapping[nn]]);
}

// Specify intercept priors
beta ~ cauchy(0, 2.5);

for (l in 1:L) {
beta[(ref_block[l]+1):(ref_block[l]+1)] ~ normal(0, rho_sq[l]);
}
rho_sq ~ cauchy(0, 10);

phi ~ inv_gamma(1, 1);
}
generated quantities {
real<lower=0,upper=1> theta_pred[N];
for (nn in 1:N) {
theta_pred[nn] = beta_rng(Alpha[mapping[nn]], Beta[mapping[nn]]);
}
}
``````
2 Likes

(Before people with more experience will reply) I would suggest to observe the traceplots including the warmup draws. Try to see if for any parameter there is one chain that explores incredibly large or small values that leads to that chain be less efficient.

Sometimes applying more informative priors can help to better constrain the model. Sometimes reparametrising the model (using normal rather than fatter tail distributions) helps too.

2 Likes

Would a lower `init_r` be likely to help? Ref: One MCMC chain not moving.

Update: I tried setting `init = 1` in `\$sample()` (using `cmdstanr`) and it looks like it completely solves the problem on my end! The slow chain is easily 20x faster now and keeping up with the other chains.

My only concern is that I do not fully understand what `init` actually does. (I read the docs but have only an intuitive superficial grasp of what HMC is doing.) I am concerned about the potential consequences of setting `init` too low (comment: One MCMC chain not moving).

1 Like

I’ll try to explain a bit more what this line in the CmdStanR docs actually means:

A real number x>0 initializes randomly between [-x,x] (on the unconstrained parameter space);

If you have a parameter declared with no constraints (e.g., `real theta`) then setting `init=1` will result in Stan randomly generating an initial value uniformly at random between -1 and 1. If, on the other hand, you have a parameter with constraints like `real<lower=0> sigma` then the same thing applies except the initialization takes place between -1 and 1 on the log scale, because the log transformation is used internally by Stan to transform a strictly positive variable to an unconstrained variable (under the hood we need a space with no constraints for HMC). Depending on what the constraints are different transformations are used to unconstrain the parameter (if you’re interested you can find all the different transformations here: https://mc-stan.org/docs/2_25/reference-manual/variable-transforms-chapter.html).

Currently the default value used by CmdStan is `init=2` (so that’s the same default for CmdStanR), which is not a magic number but is typically enough to ensure sufficiently dispersed initial values. However, in some cases you will need to reduce the range of initialization (like in your case here) because you can get extreme initial values that particular models will have a hard time recovering from (this is likely what happened with the slow chain in your case and it might be visually detectable in a traceplot that includes warmup). So while it’s not ideal to narrow the initialization range if you don’t have to, it’s not frowned upon when it’s necessary. In some cases it’s even necessary to set it substantially lower than in your case. As long as you’re running multiple chains, checking that they’re mixing well and that other diagnostics look OK then I think you should be fine.

3 Likes

I was originally taught that in order to be able to assess mixing, the starting values from one chain to another need to be more dispersed than the posterior. In grad school when I was custom-coding high-dimensional models from scratch, I always struggled with this. But from your answer, I take it `Uniform(-2, 2)^p` or even `Uniform(-1, 1)^p` is likely to be dispersed enough for large `p`. So it’s not a problem if the tails of the actual posterior reach beyond the uniform bounds?