Running Out of Ram Using posterior_epred()

cluke27 · May 13, 2023, 3:12pm

Hello,

This is my first post in the Stan forums, so I apologize if I’m in the wrong space. I’m running a brms model using variational inference. I’m using posterior_epred() to extract draws for predictions. Ideally, we’d want to use ~1,000 draws, but I am running out of RAM when generating the matrix with the posterior draws (20 years’ worth of predictions at 1,000 draws for 23452 observations).

I’m running this in an Azure Virtual Machine environment with 20 cores and 140 GB of RAM and every time I’ve tried to run it crashes my R kernel without any error to trace back on if I’m in Juptyer, or in VSCode (we don’t have RStudio installed in Azure unfortunately).

I ran locally and got the classic:

Error: cannot allocate vector of size 174.7 Gb

I am seeking advice on working around these memory issues using posterior_epred(). I know the easy solution is to ask for more RAM (costs more money in Azure sadly) or reduce the number of draws but figured I’d do my do diligence.

If there is more information that is needed let me know and I can provide some, I know this is kind of a general question.

scholz · May 14, 2023, 9:46am

You could run multiple epred calls, each with a subset of your observations in the newdata argument.

cluke27 · May 14, 2023, 1:51pm

Thank you for the response, this makes sense to me, but just confirming to do something potentially along the lines of:

dat_agg_expanded.1 <- subset(dat_agg_expanded, year == 2024)
dat_agg_expanded.2 <- subset(dat_agg_expanded, year == 2025)
dat_agg_expanded.3 <- subset(dat_agg_expanded, year == 2026)

etc.

prediction.1  <- brms::posterior_epred(mod.1, 
                                      dat_agg_expanded.1,
                                      ndraws = N_DRAWS_IN_MODEL)

prediction.2  <- brms::posterior_epred(mod.1, 
                                      dat_agg_expanded.2,
                                      ndraws = N_DRAWS_IN_MODEL)

prediction.3  <- brms::posterior_epred(mod.1, 
                                      dat_agg_expanded.3,
                                      ndraws = N_DRAWS_IN_MODEL)

Then rbind / just write straight to SQL?

Silly question, but this wouldn’t affect the predictions by any means would they?

scholz · May 15, 2023, 8:41am

That looks like what I imagined.
The only difference between this and calling epred once could be how the random seed is updated I think. Which shouldn’t matter at all.

Just make sure to use the correct rbind/cbind. Iirc, the observations are the columns and the rows the posterior samples. So you would have to cbind the prediction.n.

cluke27 · May 16, 2023, 6:40pm

This ended up being super helpful and got me in a good direction so thank you! I ended up just using just a for loop of unique IDs from my expanded data table and running predictions through there, then just writing them to our data base.

Would this be fine to use on posterior_predict() as well? Thank you so much for your help!

scholz · May 22, 2023, 6:51am

Should be fine for posterior_predict as well.

Topic		Replies	Views
Speed up posterior_epred? brms brms	1	327	January 16, 2024
Brms memory (RAM) overload brms brms	5	686	August 17, 2023
Help me understand the ouput of the posterior_epred function :-) brms	2	506	July 27, 2020
Large brms model runs out of memory brms	2	1305	January 14, 2020
Posterior_epred with zero inflated count model Modeling brms	2	186	April 4, 2024

Running Out of Ram Using posterior_epred()

Related topics