Variation in elapsed time of parallel chains and best practices for computing expectations from multiple chains

Only the number of iterations, warmup iterations, initial step size, adapt_delta and other parameters controlling or initializing adaptation are shared. During the adaptation each chain will get different step size and mass matrix.

This will not solve the convergence problem. but will get some speedup by using

y1a ~ neg_binomial_2_log(f1 + logNormFact1a, alphaVector)
...

where logNormFact1a = log(normFact1a)

Do you have reference for that “variance parameter for each other channel,”?

alpha ~ uniform(0, 1e9);

Are you sure uniform is good for alpha? Maybe hierarchical prior would be better?

Negative-binomial can produce multi-modal posterior with certain values of alphas, so it’s also possible that this would explain the behavior.

Depending on how the distance between the observations is distributed

ell ~ gamma(2,2);

might also be too vague.

I recommend to look at pair plots of ell, kfDiag, kfTril, alpha and lp__ to learn about possible multimodality or funnels.

2 Likes