I am running cmdstan and I was told (in this forum) that a slight change in the model can produce drastic changes in time of convergence. Even if is the same dataset, same number of samples, etc. The reason being that a slight change in the model makes a different chain evolution, and some chain may get highly delayed somewhere.
Therefore, there is a way to tell the sampler that if it gets too much time sampling somewhere, then to change the chain location ?
Thank you, Ezequiel.
I think you can control this partially with choosing appropriate priors and initial conditions for your chains
Hi! I was referring to something more dynamic. For instance: if a chain gets stalled and for 10m it does not have a new sample, then change location by some algorithm, and forget about the previous location.
I say so because I’ve seen incredible differences in sampling time, by just making very slight changes. Indicating that there might be some way to do not insist any more once a chain gets stalles for some given time.
Does it make sense what I’m saying? If yes, probably this does have already a setting parameter. Just that I didn’t find anything.
The computational cost per gradient evaluation is pretty much constant no matter where in parameter space a chain sits. If a chain spends a long time to return a single sample, that is because sampling in that region of parameter space, using the step size that the sampler is using, requires a large number of gradient evaluations to hit the dynamic stopping criterion. If you REALLY want to force the sampler to avoid these long integrations, you can specify a
max_treedepth of less than the default 10, or you can turn off warmup and supply a large step size. It is useful to tinker with this stuff to get a better feel for how Stan works, but if your main goal is to get results quickly, then I strongly recommend against doing either of these things, because you need to make sure that your samplers adapt adequately to your posterior. Some posteriors are hard.
This will not lead to valid inference. The whole point of MCMC methods is to do a (hopefully) unbiased exploration of the posterior. If there’s some region of the posterior that requires long integration times, then you will poison your inference by injecting a rule that avoids returning samples from this region.
Thanks Jacob, I think that I understand the coarse grain of what you mean.
(Nevertheless, some noise remains for me that the same model goes from taking 3 days to 8 hours simply because I randomly made a small change… But i guess this is just luck)
I think it’s called “experience”