Recently I have been fitting multivariate probit mixture models (both binary and ordinal) to analyze correlated diagnostic test accuracy data for meta-analysis.

At first I extended the sample code given in the Stan manual, however found it to be very low. For example with one of the larger meta-analyses I was looking at, with a total of 11,000 observations after 5 hours I still got bad R-hats. Then I extended @bgoodri’s parametrization to the mixture & ordinal cases. This improves things a lot - it now runs in around 3 hours, but with good R-hats. I also switched from using R within Windows to using R through Ubuntu with WSL - this almost cut the run time in half.

I am looking for further ways to speed up the fitting of these models in Stan. I saw a paper recently which discussed parameter-expanded data augmentation algorithms for MVP models to improve mixing. These work by extending the model by introducing non-identifiable parameters, which produce a Gibbs sampling step to sample a covariance matrix and circumvent a Metropolis-Hasting step for a correlation matrix. My question is, would it be worth looking into implementing these parameterizations in Stan to speed up sampling - or would this likely not make a difference for a gradient-based HMC sampler like Stan?

Apologies if these are stupid qs - Is this an example of where using Gibbs samplers like JAGS might be faster than HMC then, Or would the GHK parametrization in Stan likely be similar?

It might be worth me looking into using other programs for other parametrizations for these models in the future - e.g. calibrated data augmentation (CDA) - although I’m not sure how I’d go about applying CDA, the programs used / code don’t seem to be given in the paper - they mention using Stan v2.17 (I don’t think they used the GHK param.) for comparison with CDA too and found CDA was faster for large datasets

edit: I found the R code for the CDA here - I will have a go at running it and comparing to GHK in Stan on the same datasets.

Anything can be fast if you are willing to have an effective sample size of zero. Data augmentation is fairly efficient for bivariate probit but a probit model without data augmentation is similarly efficient with Stan. Data augmentation for multinomial probit is less efficient than for bivariate probit.

More generally, it is difficult to come up with a (differentiable) joint posterior density that is difficult for Stan to sample from efficiently but is easy for a Gibbs sampler to sample from efficiently. If it is difficult for Stan due to funnels and whatnot, then how are the full-conditional distributions magically going to have a variance that is close to the marginal posterior variance for that parameter?

I posted R code for CDA algorithm above from the paper I linked to by the way - seems to run extremely fast for MVP with low autocorrelation for the example given