Hi, I have been fitting models with a fairly large number of parameters (~40,000). I am not having a problem fitting the models, they just take a lot of draws. My problem is that the draws take up a lot of space! I have some good hardware. Storage is not an issue, but given memory constraints I can only run 4 chains using Rstan, even though I have 16 cores.
This seems to be because Rstan is preparing to load all the chains into the fit object after sampling is done. In my setting I would rather run lots of chains and worry about building the fit object later (maybe I will use virtual memory, or just look at the chains for certain subsets of parameters). Fortunantly, cmdstan affords exactly this. I can run 16 chains no problem!
The downside to cmdstan for me is losing Rstan’s friendly compilation interface which automatically recompiles the model if necessary, doesn’t expose me to a bunch of makefiles and so on.
I can’t seem to find a way to save the compiled stan program in the Rstan interface. Am I missing it?
You can call stan_model to compile the Stan program and then call sampling, but I don’t think that is going to help you much.
If you call stan or sampling specifying pars = character() and sample_file as some path, then you are basically doing CmdStan. Then you have the problem of how to read the draws off the disk with the available memory. The read_stan_csv function does not have an option to read a subset of the parameters.
Thanks. That should do what I want. I think I can hack around the limitations of read_stan_csv. Would it be cool if I add a pars= argument to read_cmd_stan?
Yeah and an include flag. The read_stan_csv function (should, and probably will for rstan3) take in a stanmodel so that it knows the dimensions of things. Then it would be easier to include / exclude containers of parameters.