Hello,
This is my first post in the Stan forums, so I apologize if I’m in the wrong space. I’m running a brms model using variational inference. I’m using posterior_epred() to extract draws for predictions. Ideally, we’d want to use ~1,000 draws, but I am running out of RAM when generating the matrix with the posterior draws (20 years’ worth of predictions at 1,000 draws for 23452 observations).
I’m running this in an Azure Virtual Machine environment with 20 cores and 140 GB of RAM and every time I’ve tried to run it crashes my R kernel without any error to trace back on if I’m in Juptyer, or in VSCode (we don’t have RStudio installed in Azure unfortunately).
I ran locally and got the classic:
Error: cannot allocate vector of size 174.7 Gb
I am seeking advice on working around these memory issues using posterior_epred(). I know the easy solution is to ask for more RAM (costs more money in Azure sadly) or reduce the number of draws but figured I’d do my do diligence.
If there is more information that is needed let me know and I can provide some, I know this is kind of a general question.
You could run multiple epred calls, each with a subset of your observations in the newdata
argument.
1 Like
Thank you for the response, this makes sense to me, but just confirming to do something potentially along the lines of:
dat_agg_expanded.1 <- subset(dat_agg_expanded, year == 2024)
dat_agg_expanded.2 <- subset(dat_agg_expanded, year == 2025)
dat_agg_expanded.3 <- subset(dat_agg_expanded, year == 2026)
etc.
prediction.1 <- brms::posterior_epred(mod.1,
dat_agg_expanded.1,
ndraws = N_DRAWS_IN_MODEL)
prediction.2 <- brms::posterior_epred(mod.1,
dat_agg_expanded.2,
ndraws = N_DRAWS_IN_MODEL)
prediction.3 <- brms::posterior_epred(mod.1,
dat_agg_expanded.3,
ndraws = N_DRAWS_IN_MODEL)
Then rbind / just write straight to SQL?
Silly question, but this wouldn’t affect the predictions by any means would they?
That looks like what I imagined.
The only difference between this and calling epred once could be how the random seed is updated I think. Which shouldn’t matter at all.
Just make sure to use the correct rbind/cbind. Iirc, the observations are the columns and the rows the posterior samples. So you would have to cbind the prediction.n
.
This ended up being super helpful and got me in a good direction so thank you! I ended up just using just a for loop of unique IDs from my expanded data table and running predictions through there, then just writing them to our data base.
Would this be fine to use on posterior_predict() as well? Thank you so much for your help!
1 Like
Should be fine for posterior_predict
as well.