I’m attempting to estimate a in RStan on simulated data. The program is several hundred lines, so I’m not including it for brevity. I’m able to obtain posterior draws from 10 parallel chains (500 warmup iterations, 1000 total iterations) whenever I set the number of simulated observations to 1,000. However, when I increase the number of simulated observations to 10,000, I receive the following error:
Error in FUN(X[[i]], ...) :
trying to get slot "mode" from an object of a basic class ("NULL") with no slots
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN
Execution halted
The error is triggered after 6 chains complete. However, the other 4 chains appear to never start. Any idea why changing the sample size would trigger this behavior?
Anything defined in parameters, transformed parameters, or generated quantities gets stored on every output iteration.
Any int is 4 bytes and real is 8 bytes. If you have 3 parameters and 1 generated quantity that is an integer, then that’s (3 * 8 + 4) * N_post_warmup_iters * N_chains bytes you’ll need to allocate.
Or something like that. It also sounds like it happens halfway through something, so you could just open up your computer’s system monitor or whatever and watch the memory usage.
Check the ‘pars’ parameter in the Rstan manual (edit: of the stan function). I think it lets you save only the things you want (if you have a lot of stuff).
RStan uses a lot of memory (presumably due to the vagaries of R that we can’t fix given how much this comes up).
You can use CmdStan to stream data out.
@bgoodri or @jonah (not @jgabry—this multiple identities on different systems is confusing): Is there a mode in R that only streams to a file rather than storing all the draws in memory? If so, would writing everything to file, then reading back in require less overall memory? If so, some standing instructions on how to scale R would be great. (If it exists, a pointer I could find would be great!)
We could totally fix this if we changed to just streaming output. R has some problems but the c++ interface is very flexible… after the next round of services changes it’ll be worth a shot.
I don’t see why it’d need to wait other than to avoid coding things multiple times. I didn’t know there were any services changes in the works—I know there’s been a flurry of discussion, but I didn’t know there were any concrete plans. Is there a wiki page or something with a design document?
Does it not also store the draws in memory? So the return is no longer a fit object with draws in it?