Variation in elapsed time of parallel chains and best practices for computing expectations from multiple chains

caesoma · October 25, 2018, 9:30pm

Yes, I realize, but if you have within-chain but not between-chain convergence that’s exactly the figure you’d expect from the permuted samples, right? (Assuming this “permutation” just concatenates the thinned chains, and that’s why there are 8 "steps here.)
Shouldn’t then the algorithmic parameters be shared between chains, and convergence be left to the likelihood and stochastically drawn momenta? I didn’t see any tree depth or divergences warnings, only positive semi-definite rejection warnings, which are kind of expected.

The model is a multi-channel gaussian process of the form (for M=2 channels):

y \sim \mathcal{N}(\mathbf{\mu}, K) , K = \begin{bmatrix} \sigma_{11} C_{11} \sigma_{12} C_{12} \\ \sigma_{21} C_{21} \sigma_{22} C_{22} \end{bmatrix}

where \sigma_{ij} are the signal variance parameters for the different channels C_{ij} are the correlations between the data points given whatever kernel, and \mu is just a vector of M concatenated means for each channel. Likelihood is negative binomial, so I’m sampling two latent functions (two experimental replicates) that otherwise share all other parameters.
Each channel has ~50 data points, so the total number of data points is \sim 50 M, and I get to choose that 2 \leq M < ~10000 (but I won’t go there because there are \mathcal{O}(M^2) parameters ).

There’s not much more in the stan code, I think, but if you think it will really be helpful I can post it. The model is large but simple, and I don’t think there’s much to be done with the priors, I started with uniform priors for the off-block diagonal parameters (we have no idea what these could be) but then switched to zero-mean normal priors with variance vaguely-related to the data (that can’t be known either).

As far as I understand this could be fixed by having a longer burn-in and changing the adapt_delta default. That’s my main question here.
If that is possible everything should converge and all the ad hoc stuff goes away, otherwise there is a finite probability that the initial adaptation of the sampler ends up with different values for a number of chains (the more the worse), and runs are bound to have “outlier” chains that just screw up the whole analysis, and would lead to a ridiculous situation where I have to run the same analysis over and over until all chains converge.

Thanks again.

Topic		Replies	Views
Inconsistent chain speed - does this give a clue about the problem? Algorithms optimization	10	4789	July 20, 2018
Parallelizing the sampler (not the model) Developers	22	2586	August 16, 2022
Evaluating parallelization performance Developers	23	2040	October 1, 2019
One MCMC chain not moving Modeling cognitive-science	7	2789	November 11, 2020
Help with understanding my results Modeling fitting-issues , specification , performance	5	1257	February 8, 2019

Variation in elapsed time of parallel chains and best practices for computing expectations from multiple chains

Related topics