Hierarchical model: different seeds leads to a significant difference in the time-run as the likelihood is bernoulli

I have the following model:

model {
  // priors
  tau ~ cauchy(0,1);
  for(j in 1:M){
  indRE[j]~normal(0,tau);
  }
  for(i in 1:N){
   target += N*omega[i]* bernoulli_logit_lpmf(Z[i]|X[i,] * alpha + indRE[I[i]]); // weighted likelihood ;
  }
}

And depending on the seed that I fix, the time-run for a single dataset can range from 30 seconds to 60 minutes. Does someone know about this problem? is it possible to fix it?

Thanks

EDIT: @maxbiostat edited this post for syntax highlighting and formatting.

It’s not super surprising.

With time differences like that there are presumably dramatic differences in what the chains are doing. I’m guessing they don’t mix?

From the code you posted, two things:

  1. Do a non-centered parameterization for indRE, see here.

  2. Tighten the prior on tau. Maybe a half-normal or an exponential.

But since you’ve already run the code a few times you should look at the output and see if the differences in the inferences made by the fast/slow chains show any sort of application-specific behavior (like maybe the results are interpretable and they give you clues about what is going on).

Also you can do code blocks on discourse by surrounding code on each side by three backticks (or there’s a little code formatting block in the toolbar of the post). Something like this: ```write code here```

1 Like

Thank You very much for the answer!

I was guessing that should be something related to mixing too.

I had run two chains, and as this happens, the first chain finished usually fast, and the second chain takes longer.
For instance, if I fixed 10000 iterations with 2000 of burnin and 8 of thin, both chains arrived at 3000 iterations at the same time, but then the second chain stopped for a while to update (sometimes until the first chain get 8000 iterations). After that, the time between 4000 to 5000, 5000 to 6000 and so on, increased. Because of that, I supposed that could be something with mixing but I was not sure about that.

Thank You also for the hints about the post. This was my first one, and I was not quite sure how I could do this.

For the most part we don’t encourage thinning posterior samples. There’s usually not enough draws for it to be such a problem. Just use the effective sample size (ESS/N_eff) estimates from the output.

1 Like