I have the use of a virtual machine with a lot more processing power (32 GB ram and 16 cores) than my local machine, which has got me thinking of using it optimally. I found this website about running multiple models in brms using the future package but I had a question pertaining to a single model.
Imagine that I wanted 40,000 total post-warmup samples. Now I could run 4 chains with 10,000 samples each and set the cores argument to 4. Would this be the same as running 8 chains with 5,000 samples each? My intuition says no as in the latter case each chain hasn’t explored the posterior distribution long enough thought I’m not very sure about this intuition… I would appreciate any insight in this regard. Thank you!
In my experience, it’s seldom necessary to run more than 4 chains. The discussion to use one long chain or a few shorter ones is not new in the community, but at least for Stan the default of 4, with accompanying diagnostics, seems to work well for me.
But why 40k samples? That sounds very much to me, and with a good model and decent data, I would say that a few thousands are very often enough. Do you have a particular reason for this large amount of samples?
Of course, if you have many cores then within-chain parallelism could be an option to look at. I know @paul.buerkner et al. have started looking into how this should be done in brms, but it’s still early days (you can have a look in the reduce_sum branch of brms). I guess, for now, you need to do it in Stan.
I wish to do model comparison, via Bayes Factor, and have read in a couple of places to have “proper priors” and a “plentiful posterior” (at least 10 times the default 4,000 post-warmup samples), so that was my reasoning of going for a large amount of samples.
Ah, that’s the term - “within-chain parallelism”. That is what I wish to do, but from the sounds of it, it is not possible in brms currently. I might run multiple models in parallel atleast then…
Well, first of all I’d try to run it a few times with smaller sample size since it might be stable anyways :) Second, I would probably use LOO for model comparison and not Bayes Factor.
I’m new to Bayesian models in general so don’t mind my question - I’m guessing that’s meant to read “since it might be unstable anyways”? What would I do if it was unstable with a smaller sample size? That is, would I tweak my priors, would I reduce the parameters of my model etc.?
Thanks for that suggestion and it looks like I can run loo_compare on the model fits. Is there a benefit LOO has over BF?
I think we’re mixing concepts now :) You can run your model with, e.g., 4000 iterations instead of 40K, and you do it 3-5 times and apply Bayes factor. If you see that Bayes Factor is stable and doesn’t move much then it’s an indication that you don’t have to run 40K iterations (of course, I’m assuming that running 40K iterations will take a very long time - if that’s not the case you might as well run it with 40K).
I would say LOO is the current state of art concerning model comparison. Just like WAIC it gives you an indication how much the models differ in out of sample predictions (relative to each other). However, one good thing with LOO is that you have very good diagnostics.
@torkar, I tried adding loo via the add_criterion function but I get an error about not being able to allocate vector of size 14.0 GB… I’m running this on Linux and AFAIK there is no limit of RAM to R in Linux. Also, I have 32 GB RAM and ~ 24 GB free before I execute the add_criterion command.
This is probably something @avehtari would be interested in hearing about (he has a lot of teaching atm, but I’m sure he’ll reply as soon as he can).
That’s a lot. PSIS-LOO needs much less draws than Bayes factor. Try with 4000 draws and tell if that works.
Also as the number of observations is that large, even with 4000 draws and PSIS, the computation can be slow. See also Approximate LOO-CV using PSIS-LOO and subsampling for faster estimate which has diagnostic to tell if the subsample is large enough.
Perhaps to slightly clarify, that error does mean that some call resulted in an assignment of 14GB when it wasn’t available. But it does not mean that add_criterion needed only 14GB. Functions can do many assignments filling the RAM, and you will only get an error for the assignment for which there is no space. E.g. if you have 24GB free and add_criterion has to do 2 assignments of 14GB, you would get your error message for the second assignment.
Worked like a charm @avehtari. I didn’t have to subsample either.
Ah, that’s interesting to know. Thanks for that @Ax3man!
I had a question with using loo_compare - are there any guidelines about how the elpd_diff should be interpreted in light of the se_diff (I have read somewhere that the former should be “multiples” of the latter but it seems a bit vague… Or am I confusing this with the elpd_estimate?)