I have run some MCMC calibration of a multi-level 2PL IRT model. My calibration converged but I got the following plots which I find kind of concerning, but not sure how to explain them. Any help is much appreciated.
Traceplot:
alpha[1] and alpha[8] 10_10_0.6_TwoPL_Alpha_Traceplot_1.pdf (26.3 KB)
Thank you Max for the feedback. I am concerned because I see there is still some autocorrelation in the lags even though I have thinned at every 10th iteration. The hairy caterpillar is concerning too because the two chains does not seem to be that stationary (still going up and down quite a bit). Am I overconcerned? The other diagnostics looks OK to me. Could you help me take a quick look? Please find attached. 10_10_0.6_TwoPL_Diagnostics_1.txt (3.8 KB) The rhat at each individual parameter estimate looks good. Is there any way to get an overall estimate of the rhat across all the chains we run? I am not sure how big the effect sample size is regarded as big enough. Please see here (partial results): 10_10_0.6_TwoPL_Par_Ests_1.pdf (63.8 KB) I ran 2 chains of 1,000 post-warmup iterations with sample size of 2,000. But in the effect my effective sample size is around 200 for each parameter. Is that enough to get stable estimates? The MC error and the standard deviation of all the estimates looks decent. Thanks.
That looks all good. The Neff is pretty low, thoughā¦
But first: Rhats look fine. There is no overall measure for it afaik. Just make sure that all Rhat < 1.1. This indicates that the chains converged - itās a necessary, but not sufficient condition!
The movement in the plot indicates autocorrelation, and the relatively low Neff is also indicative of that. The autocorrelation plots on the other hand do not look that bad. And while the Neff is not great, itās not super terrible - but this really depends on your application! If you want to compute extreme quantiles of the posterior, then Neff of around 200 is certainly too low. If your interested in posterior means/medians, you should be fine.
You can run longer chains (and more!!) and just crank up the Neff in a ābrute forceā way to the level you need for you analysis. Iād only use thinning if the number of posterior draws gets too big to handle conveniently - otherwise youāre throwing away information.
The more elegant way is to parameterize your model in a way that you donāt have these autocorrelations in the first place. Iām afraid I donāt know enough about these kind of models to be of help here. :(
Thank you for the quick feedback, Max. This is super reassuring!!! I will try to run a long chain to increase my Neff, which hopefully consequently get rid of the autocorrelation. Why HMC sampler does not recommend thinning? I do not quite get it. As I did notice the difference between thinning and not thinning with the same parameter setting. There are some very high autocorrelation without thinning, while the AC looks much more acceptable after thinning. Maybe it is because my chain is not long enough. BTW, how many chains do you guys often run? How many iterations do you use in your field? Any knowledge about the effective sample size in education/sociology/psychology? Thank you very much.
I think thinning or not is a dilemma. Any way, maybe HMC is very efficient so that it actually self correct autocorrelation? Thank you for the extra resource. I found them all useful. Another neat tool is shinystan for visualize model diagnostics and estimates. Thank you very much. :-)
4 chains is the default in rstan and what I normally use. Most of the time the number of CPU cores is the most reasonable pick, for obvious reasons, but in any case, I think two is not enough. You also want more chains to check whether they really converged: each chain starts at different initial values and if they all converge to the same parameter space, you can be a bit more confident about overall convergence.
Quick question: Are you using HMC or NUTS? (The default in rstan is NUTS.) Asking, because you really should use NUTS if you are not already.
Thanks Max. I think I am using the default setting if I did not change. So I think it is NUTS. I am running 2 chains to start with because it takers forever to run 2 chains of 1000 post warmup iterations. Is there anyway to speed up the estimation? I tried desktop computer as well, only somewhat faster than my laptop. It is a simulation study. How many repetitions will you use if you run simulation study. Thanks. :-)
Thatās good. If your laptop (or desktop) has more than 2 physical cores, then running the model will probably not take that much longer, though. Did you specify thin = 10, or did I get that wrong? Did you get warnings when not using thinning?
HMC has low autocorrelation, because it explores the parameter space efficiently. NUTS even more so. Check out this great lecture by Richard McElreath for intuition for why NUTS is working so well. This presentation by Michael Betancourt is pretty awesome as well (check after ca. 30 min. especially). HOWEVER, itās possible (loosely speaking) that the way you implemented the model leads to a posterior geometry that induces autocorrelation - it is hard for the chains to reach new/different regions of the posterior fast.
If you are at liberty to disclose your model code, I would suggest to start a new thread posting you code with the request to help to make the model more efficient.
My computer/laptop both have 4 cores. So running 4 chains work just fine. But not sure why it takes forever to finish any chain over 1000 iterations in less than 5 hours? I did specify thin = 10 in rstan model specification as follows:
fit1 <- stan(
file = paste0(path,āTwoPL.stanā), # Stan program
data = data, # named list of data
chains = 2, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
thin = 10,
cores = 4, # number of cores (using 2 just for the vignette)
refresh = 1, # show progress every ārefreshā iterations
control = list(max_treedepth = 20,adapt_delta=0.99)
)
As my model is hierarchical, maybe I really should think reparameterizing my model to make it more efficient?
I assume speeding up the model and getting convergence is pretty much the same problem here (this more recent talk by @betanalpha also gives good intuition for that) and reparameterizing is most probably the solution (to both).