May I ask how much memory loo() function suppose to use with a known log_lik matrix size? In my case, log_lik is a matrix of 8G, I only use one core to calculate loo(log_lik, save_psis = FALSE) and it’s aborted giving Error: cannot allocate vector of size 8.1 G. My PC has 32G RAM, when I launched loo the RAM usage went up to 30G in a few seconds and after a few minutes, it just run out of memory.

I also tried on a HPC with 256G RAM, the calculation finished without error and warning, but when calculating for the other model with the same size of log_lik, it went out of memory. I tried to switch the order of these two models for loo calculation just to make sure it is not model specific, and it always fail at the second time. I printed the memory usage after calculating loo, it is only 12G shown below, and the results of loo is very small like 34M which make sense.

[1] "2022-03-16 21:20:56 EDT"
loo net:1: 2612.792 sec elapsed
[1] "memory used:"
12.1 GB
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 3141003 167.8 9253544 494.2 11566930 617.8
Vcells 1496597583 11418.2 7875944800 60088.7 12306163748 93888.6
[1] "loo restuls used 34.584888MB"
[1] "------------------------------------------------------------------------"

As you can see here I also called gc() to free some memory between the calculations but it didn’t help.

Is there a way to solve this issue? or if we could mimic cmdstan way to manage RAM? The model fitting never get problem even with all log_lik and posteriors. I am not a CS person, so not sure what’s going on here.

Another side question is loo took almost 1h to finish, do you think it is normal? I am suspecting that most of the time were spent on loading csv by cmdstanr. I will try to extract log_lik first and report the real calculation time.

Hi,
Can you also tell the number of rows and columns in your log_lik matrix (that is the number of posterior draws and number of observations)? I assume you are using cmdstanr and a hand written Stan code?

Thanks, @avehtari, dim(log_lik) returns to me 4000 by 270165, so 1000 samples, 4 chains, 270165 data points.

Btw, to correct the total memory size I have on HPC, it is 125G not 250G, I requested 2 nodes in total it should be 250G, but for some reason, R is not using all memory.

Below is the memory profiling log from my latest test. It seems that during loo calculation memory usage went up to 111G. The calculation took almost 1h. Note that the freeram is 92G after the first loo calculation for the first model. Then it might explain why the second went out of memory because the peak ram usage is 111G.

Sorry to reply with multiple posts, I just got the other test results. Regarding my previous side question,

It seems indeed most of the computation time in fit$loo() is spent on reading csv. I first load csv by data.table and extracted log_lik matrix, then used loo.matrix method, which only took 1186s instead of 3394s using fit$loo(). Please see my dirty code below and the output log

Ok, that’s a lot. How many parameters do you have? If the number of observations per parameter is very big, you may not need loo at all. If you still think loo would be useful, I recommend to use subsampling loo as discussed in the vignette Using Leave-one-out cross-validation for large data • loo

Some Stan developers are examining faster options, but that may take some time before it helps you

Thanks, I have 4780 parameters, I am reading this vignette these days, will need some time to digest :)

Using data.table::fread() can speed up the reading process from my experience.

Although there might be better solutions like subsampling loo. It still buzzes me, may I ask how to free the memory after one loo computation? As you can see here the final result of loo is only 30M, but it still takes 12G ram, I wonder how can I free this 12G after loo is finished, gc() seems not helping. Sorry I am not a CS person, so I am not good at programing and so on. Thanks.

Yes, sure. I actually mentioned the model in this post How to calculate log_lik in generated quantities of a multivariate regression model - #6 by Michelle. The model is not so complex, For data, I have ~ 350 individuals, each has ~ 2000 observations. These 2000 observations of one individual are not independent but have some level of autocorrelation, and I modeled them using a linear variate-covariate model with a known design matrix plus AR(1) and Gaussian for residuals. Then we also know the parameter for the design matrix of all individuals follows a kind of hierarchical structure. I also expect this hierarchical structure could help to regularize the individual parameter estimations since 2000 observations are sort of noisy. Therefore, the data matrix is modeled as multivariate regression, in which the design matrix is known based on our domain knowledge and follows a hierarchical structure. That’s why I have more than 4000 parameters, which are parameters for the design matrix and AR(1) per individual and other higher group-level parameters. In the end, since we can make different assumptions about the hierarchical structure, I have several different models and I hope model comparison could answer the question of which model is more generalizable or have better prediction ability. I also have one model doing massive univariate regression without hierarchical structure to see whether each individual’s data is actually enough to estimate well the parameter of interest such that we don’t need a multilevel model. You mentioned previously to consider K-fold validation, but if we consider the hierarchical structure I am not sure I can take part of individuals and fit the model since it is hard to decide how to partition the data. Or maybe I should take part of observations and keep all individuals for each fold, but it means I will have to fit the model multiple times and I am not sure if the autocorrelation will cause some potential problem.

May I ask the reason why we don’t need loo at all in this case? What should I consider if I would like to do model comparison?

Thank you in advance @jonah, based on what I checked, the memory usage was only a few M, during loo computation, it could go up to ~90G and take 12-20G after the calculation, but as we know the final results of loo is very small like a few tens of M. I can not free this 12-20 G in R after calculation, I tried gc() it did not work. I also tried ls() to see all objects in the session and print their size, but none of them is larger than 1G. So I guess it might be related to loo or it is an inherent issue of R? This causes a problem that I can not run loo for other models in one script, because this memory usage will accumulate and finally abort the session. For instance, I have 4 models, the first 3 will accumulate around 60G, and loo itself will need 90G for calculation of an 8G log_lik (using more cores is basically not possible in this case). Note that I did rm() followed by gc() to free LLmat and rel_n_eff.

Hmm, I would have suggested gc(), which you already tried.

Is it possible that most of the 12G is taken up by the CmdStanMCMC objects (CmdStanR’s fitted model objects)? If you call any methods that result in the CSV files being read in (e.g., summary(), draws()) then log_lik may be read into memory.

Thanks @jonah, sorry for the late reply, I was trying to do some more tests and reports here in case it could help.

Briefly, I did not do any summary(), draws() in the code. Please find the code and output log below, where I got out of memory when using cores = 2 for loo on a HPC node with 125G ram. Before this chunk of code, I only loaded the cmdstanr output file name, not the file content, as you can see from outputs of the first mem_used() and Sys.meminfo() there was not much ram usage. I am not sure exactly how does fit$loo() work, but it took 88G during the calculation. The size of all four csv is around 8G. With 125G ram and cores =2, I can only iterate once before out of memory.

Then I tried to extract log_lik using data.table::fread, and calculated loo by loo.matrix method. I also added rm(list = c("LLmat", "rel_n_eff")) to free some memory. As you can see in the code and output log below, LLmat is around 8 to 14G matrix. In this case, it seems gc() helped to free the memory and freeram reported by Sys.meminfo() did not accumulate too much. However, in iter 4, freeram dropped to 83G, whereas after previous 3 iters, it was 91G. Then the 5th iter was aborted since it has a huge LLmat size 14G and I assume it will require more than 100G for loo calculations. An interesting difference to fit$loo method is that computation time reduced from 2335s to 702s since reading csv is faster using data.table::fread.

Until here, my summary is 1) compared to fit$loo(), data.table::fread method reduces a lot computation time by reading csv faster; 2) gc() seems freed the memory in data.table::fread method, but I am not sure why freeram suddenly dropped to 83G after 4th iter. 3) we are not sure if gc() could free memory in fit$loo() method. So I reduced the cores = 1, hopefully we can see if freeram is similar after each iter in fit$loo() method, please see below output log. Bascially, using 1 core allowed less ram usage during loo computation, so R session was aborted after 2 iters. There are several interesting observations: 1) gc() seems not helping to free the memory, since freeram is reducing iter by iter, 87G to 67G 2) computation time is not so different compared to cores = 2, since most of time were used for reading csv 3) the loo restuls object size is larger than the cores = 1 case, for instance, 51.5Mb vs. 14.4 Mb for the 1st iter, which I don’t understand.

To verify if loo results object size will change when using different core numbers, I also compute cores = 1 case using data.table::fread method. Please see the output log below. All 5 iters were finished without error since cores = 1 used less ram during loo. The loo results objects sizes for the first 4 iters are identical when using cores = 1 and 2. Using 2 cores also substantially reduced computation time from 995s to 702s.

All in all, 1) data.table::fread method may be useful especially when log_lik is large. 2) I was wondering why cores = 4 did not improve loo computation time too much in my previous work with much smaller size log_lik, it seems now the underlying reason is csv reading. 3) fit$loo() seems to load all fit csv for loo computation, but after the calculation gc() can not free this ram usage, so it will accumulate from iter to iter. 4) Loo computation in general uses ~10 times log_lik() mat size memory, I don’t know the details of the computaion itself, but do you think if there is a way to free some memory during the calculation?

I apologize for writing too many. Thanks a lot for your help.