This might be a very repeated issue from newbies as myself and even might be quite annoying. However, here I go.
I have found in papers of bayesian inference the use from a few thousands of iterations for warm-up and sampling, up to huge numbers (>100000).
In this post from the old google group, Bob suggested to check the n_eff simulations in order to check whether is necessary to increase the number of iterations (i.e., by doubling it).
By using shinystan, I have noticed that one can decide upon the warning threshold value (ranging from 0 to 100%, being by default 10% if am not mistaken).
So, my questions are some:
Is there any guidance on how to pick up this threshold?
In the mentioned post, someone asked if it is possible restart a new simulation with the output of a previous simulation and the answer by that time was that wasnât possible, it is now?
I have some models where the montecarlo se / posterior sd and the Rhat statistic show no warning at all, while the n_eff/N is present in some estimates. Is this a sign that I should increase the number of simulations even thought the chains seemed to have converged?
Specifically for the Stan implementation of HMC you can expect for most well-parameterized models to get more than one effective sample per 10 iterations. So the 10% threshold is a good one. If youâre getting fewer than that you should consider other options for parameterization (unless youâre getting a big enough effective sample size anyway, in that case carry on).
I think itâs either close or already possible, not sure if itâs made it into the interfaces yet. Itâs not much help though.
Nah, if your sample is big enough and you meet the convergence criteria donât worry about it. OTOH if youâre running Stan for hours/days to get your samples you are a) wasting your time; and b) likely using a terrible parameterization (or a very hard model) and you could do much better.
All that stuff about running a million iterations and thinning by 10k is irrelevant for Stan/HMC, donât do that.
This varies tremdously with the geometry of the problem. What you shoudl be seeing is that if you run it for twice as many iterations, you get twice as large an n_eff. It may be mixing slowly, but that will show you if itâs mixing.
In the works for Stan 3. Mitzi coded up the basic I/O, but weâre waiting on refactoring some of the interface code so we donât have to code this all up twice.
This is only going to matter if (a) you have really long autocorrelation times in an otherwise well-behaved model, and (b) you donât have enough memory. You always lose information by thinning.
Give users a chance not to learn everything at once: for lots of people doing fairly standard models itâll be a loooooong time before they run into something thatâs well behaved and gets few n_eff / 100 iterations⊠:)
Depends on what theyâre doing. Lots of users jump in with really complex models. I think itâd benefit most of these users to start more gradually, as there are indeed lots of different things to learn when dealing with a new programming language, and a really big program isnât the best way to do that (Andrew disagrees hereâhe hates the âhello worldâ appraoch as much as I love it).
Unfortunately, some of the relatively simple looking models like stochastic volatility time series or even just a high-dimensional multi-normal will require quite a high iteration to effective sample size ratio.
Thatâs a good point. I guess my bias is towards leading people in the right direction a step at a time, especially if they havenât demonstrated that theyâre unreasonable to begin with. OTOH if I tell somebody to start with small components and they canât let their giant model go for a while, well, you canât save everyone. Itâs like the tough guy in the zombie movie who wants to go off on their own. ÂŻ\_(ă)_/ÂŻ
Or the dumb kid in the horror movie. I like the analogy :-)
I had to search that on Googleânot exactly clear from looking at it that itâs supposed to be a shrug emoji! Very appropriate, as Iâm on vacation in France, the land of the perfect shrug (I canât quite coordinate the pursed lips, exhalation and slight shrug thatâs so expressive). Mitzi has an awesome book of French gestures, my favorite of which is the bullshit gesture, though Iâve never seen it used in France.
I lived in France from 6 till 9 (?) I think, I can never keep the dates straight. Long enough to start acting like a little French child by the time we returned to the wrong side of the Berlin wall. Maybe this is why I feel like I should be able to gesture instead of talking sometimes (and why nobody here understands!) :)
For example, I have 10,000 data points and ESS of ~300 for a few estimates - these would be picked up by the 10% threshold but not a 5% one. Otherwise, the chains converge, MCSE looks okay, pp_checks are reasonable, and estimates look okay. When I increase the iters, the ESS scales upwards proportionally (or more) but otherwise no changes to model estimates.
What are the pros and cons of selecting either n_eff/N threshold? Iâd be keen to save time!
If your Markov chain behaves well enough then effective sample size controls the error of your MCMC estimators, such as the mean (see https://betanalpha.github.io/assets/case_studies/rstan_workflow.html for more details). So you want to generate enough effective sample to be sufficient for your application. In general there is no unique answer.
For example, if I all I want to do is crudely locate the mean within the marginal posterior distribution then error[f] ~ sqrt{ Var[f] } / 3 should be sufficient and that implies