Error triggered by changing sample size?


#1

I’m attempting to estimate a in RStan on simulated data. The program is several hundred lines, so I’m not including it for brevity. I’m able to obtain posterior draws from 10 parallel chains (500 warmup iterations, 1000 total iterations) whenever I set the number of simulated observations to 1,000. However, when I increase the number of simulated observations to 10,000, I receive the following error:

Error in FUN(X[[i]], ...) : 
  trying to get slot "mode" from an object of a basic class ("NULL") with no slots
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN
Execution halted

The error is triggered after 6 chains complete. However, the other 4 chains appear to never start. Any idea why changing the sample size would trigger this behavior?


#2

Do you know how much RAM it consumed? Sounds like you just ran out.


#3

Hi sakrejda, how might I check?


#4

Anything defined in parameters, transformed parameters, or generated quantities gets stored on every output iteration.

Any int is 4 bytes and real is 8 bytes. If you have 3 parameters and 1 generated quantity that is an integer, then that’s (3 * 8 + 4) * N_post_warmup_iters * N_chains bytes you’ll need to allocate.

Or something like that. It also sounds like it happens halfway through something, so you could just open up your computer’s system monitor or whatever and watch the memory usage.

Check the ‘pars’ parameter in the Rstan manual (edit: of the stan function). I think it lets you save only the things you want (if you have a lot of stuff).


#5

RStan uses a lot of memory (presumably due to the vagaries of R that we can’t fix given how much this comes up).

You can use CmdStan to stream data out.

@bgoodri or @jonah (not @jgabry—this multiple identities on different systems is confusing): Is there a mode in R that only streams to a file rather than storing all the draws in memory? If so, would writing everything to file, then reading back in require less overall memory? If so, some standing instructions on how to scale R would be great. (If it exists, a pointer I could find would be great!)


#6

We could totally fix this if we changed to just streaming output. R has some problems but the c++ interface is very flexible… after the next round of services changes it’ll be worth a shot.


#7

I think the sample_file argument does this, but I agree that we could benefit from a guide for doing this.

I didn’t even realize I’m jonah here and jgabry on GitHub. I should have used jgabry here too.


#8

Absolutely.

I don’t see why it’d need to wait other than to avoid coding things multiple times. I didn’t know there were any services changes in the works—I know there’s been a flurry of discussion, but I didn’t know there were any concrete plans. Is there a wiki page or something with a design document?

Does it not also store the draws in memory? So the return is no longer a fit object with draws in it?


#9

You’re right, it doesn’t have to wait. Actually I’m r there’s the memmap package that should let us write to for transparently, if it works well.


#10

Just remember that as you add dependencies, maintenance costs grow at least quadratically. Here’s how I tried to explain it on Andrew’s blog:


#11

Can’t argue there.


#12

I might have run into the same problem. A work-around is to use sample_file=tempdir()? It’s that simple?