# Strange convergence behaviour

Hi,
Has anyone come across a situation where the same model and data converges (at least all R_Hat == 1) and then you run exactly the same model and data again and it doesn’t converge, and then again it doesn’t converge, and maybe about 1/5 times it converges? When it converges everything looks ok, and it has very sensible answers. When it doesn’t converge the posteriors are almost the same as the priors. When the ‘variational’ method is used, this always gives close to the same results as when the sampling method converges.
Thanks,
Mel

Yes but not sure from your description. You might have priors that are quite different from the posterior and a lot of data so the posterior is sharp and never gets found. When the model does converge what are parameter values like? If they’re dramatically different from zero it’s worth rescaling the model.

Hi, thanks very much for looking at this. When it converges everything looks fine, and gives results that I would expect. The parameters are ok, the variances are within tight ranges, etc. I’m trying to make a latent variable, and when it converges the estimates I get for the latent variable are completely sensible. Here is the code, sorry it is a bit long:

data {
int<lower=0> n; /number of observations/
int<lower=2> k; /Likert 1,2,…,k */
vector[n] w; /binary factor/
int<lower=1,upper=k> mir[n]; /ordered response variables/
int<lower=1,upper=k> age[n];
int<lower=1,upper=k> myb[n];
int<lower=1,upper=k> mya[n];
vector[n] gui; /response/
vector[n] sha; /response/
int<lower=0, upper=1> apo[n]; /binary response/
}

parameters {
vector[2] b_mir;
ordered[k-1] c_mir;
vector[2] b_age;
ordered[k-1] c_age;
vector[2] b_myb;
ordered[k-1] c_myb;
vector[2] b_mya;
ordered[k-1] c_mya;
vector[n] Ship;
vector[3] b_sha;
vector[3] b_gui;
vector[4] b_apol;
real<lower=0> sigmasha;
real<lower=0> sigmagui;
}

model {
for (i in 1:2) {
b_mir[i] ~ normal(0,5);
b_age[i] ~ normal(0,5);
b_myb[i] ~ normal(0,5);
b_mya[i] ~ normal(0,5);
b_sha[i] ~ normal(0,5);
b_gui[i] ~ normal(0,5);
b_apol[i] ~ normal(0,5);
}
b_sha[3] ~ normal(0,5);
b_gui[3] ~ normal(0,5);
b_apol[3] ~ normal(0,5);
b_apol[4] ~ normal(0,5);
sigmasha ~ cauchy(0,2.5);
sigmagui~ cauchy(0,2.5);

``````    for (i in 1:n) {
Ship[i] ~ normal(0,5);  /*Latent variable*/
}

for (i in 1:n) {
mir[i] ~ ordered_logistic(b_mir[1] + b_mir[2]*Ship[i], c_mir);
age[i] ~ ordered_logistic(b_age[1] + b_age[2]*Ship[i], c_age);
myb[i] ~ ordered_logistic(b_myb[1] + b_myb[2]*Ship[i], c_myb);
mya[i] ~ ordered_logistic(b_mya[1] + b_mya[2]*Ship[i], c_mya);

sha[i] ~ normal(b_sha[1] + b_sha[2]*Ship[i] + b_sha[3]*w[i], sigmasha);
gui[i] ~ normal(b_gui[1] + b_gui[2]*Ship[i] + b_gui[3]*w[i], sigmagui);

apo[i] ~ bernoulli_logit(b_apol[1] + b_apol[2]*Ship[i] + b_apol[3]*gui[i] + b_apol[4]*sha[i]);
}
``````

}

One funny thing about this model is that the model for `mir` is completely separate from the model for `apo/gui/sha`, completely separate from the model for `age`, etc… so you might want to break them out and run them separately to find the problem. If you’d be willing to split the models up and check which one(s) have the problem I’d be happy to look further.

I frequently have similar issues. As @sakrejda says, one way this can happen is when you are stuck in the fat tails of a spike-and-tail distribution. If the process never finds the spike, it just drifts around aimlessly. Awkwardly, posteriors seem to become harder to fit when they are very well identified, particularly in high-dimensions. You basically have to find in an enormous space, one little tiny region where all the probability mass is…

One thing that might help is to first do an optimization step, and then take the output and use it as the initial point for your regular sampling statement.

I should add that if you have some idea of what effect sizes will be scaling everything (esp. the spikes) to be about N(0,1) works great.

I often have an idea how big the effect size will be, but no idea whatsoever how big the posterior standard deviation will be (and it will depend on how big a sub-sample of the data I decide to use!). So it’s typically quite possible to make the say median value be O(1) but very tricky to get more than that.

Also, I am working with simplexes these days and of course they are not really rescalable.

Also, I have had some luck on some models using the variational bayes approximation, and then grabbing an initial vector from that. Typically these initial vectors are better starting places than whatever random junk you get otherwise, but still often not that great.

``````vbsamps <- vb(...)

get_inits(vbsamps,iter=100)
``````

for example

Hi, thanks again to everyone.
This worked as suggested - first fitting using the ‘variational’ method, and then using the means of the posterior distributions of the parameters as initial values in the sampling method.
Thanks!