I used the following code to run 3 chains in parallel, and their progress is printed to a website, as shown in the snapshots below.
The 20000-100 iterations of sampling along (not including warmup) takes 2438.24s for chain 2, but 76191.3s for chain 3! The chain 1 remains not finished yet.
All the 3 chains run on the same model.Why this can happen?
I had that happen with a large gaussian process model and the default adapt_delta=0.8 parameter, the biggest problem being that the posteriors did not to mix properly. And they would take from 1 to 10 days.
From the discussion here I gathered that the value was too low and therefore the chains were not equally tuned after the burn-in period (or something along those lines, I’m not familiar with all the details of the NUTS tuning). Maybe you can check if the different chains are mixing as expected and the only difference in the chains is the time it takes.
I increased the value to adapt_delta=0.9 and mixing got better (may still need to increase it to 0.99 and/or run a longer chain) and the variation in between-chain elapsed times decrease to a range of something like 2-3 days (although I ended up changing other parts of the model so I can’t compare directly).
Maybe try that first, and if you get elapsed times in between you’ll know in a few hours.
Did it reduce the variation, at least? Maybe it’s a deeper problem, even HMC can suffer from that if the posterior has a weird shape (like the apparently infamous “funnel”). That may be fixed if you are able to constrain the parameter space through a meaningful specification of the priors (assuming there are no identifiability issues in the likelihood that remain despite that).
If you can describe the model itself and post the stan model code other people here may be able to help (I don’t always find stan code intuitive to understand, so I’m not sure I have the intuition to given advice based on that, but depending on the type of model and the actual mathematical description maybe I could as well).