Calculate log_lik from model using less memory

Tom_C · August 23, 2019, 12:20pm

To calculate the log likelihood from a model, I’m aware of this method in the generated quantities block:

vector[N] log_lik;

for (n in 1:N) log_lik[n] = bernoulli_lpmf(y[n] | p[n]);

Where the log_lik quantity can then be used in the loo package. However, this is very memory intensive. In my case, I’m using a model where N=50000 and have 2000 posterior draws after thinning. The inclusion of log_lik in generated quantities increases the storage size of the rstanfit object from 3Mb when only the intercept and coefficients are extracted to 1.5Gb. As the chains are thinned by a factor of four, there is presumably ~6Gb in RAM during model fitting.

Is there a less memory intense method? A thread mentions the “foo.function method”, but I can’t find an explanation of this:

The same thread also mentions only taking every nth observation, presumably:

int M; M = N/100; vector[M] log_lik

for (n in 1:N) if(n%%100==0){ log_lik[(n/100)] = bernoulli_lpmf(y[n] | p[n])};

What is recommended?

avehtari · August 23, 2019, 12:52pm

In addition of using function approach, you would benefit from subsampling loo.
See the paper http://proceedings.mlr.press/v97/magnusson19a.html
and the vignette
https://github.com/stan-dev/loo/blob/master/vignettes/loo2-large-data.Rmd
Currently need to install loo from github to get the subsampling loo (will get to CRAN later)

if (!require(devtools)) install.packages("devtools")
devtools::install_github("stan-dev/loo")

jjramsey · August 23, 2019, 2:04pm

I think you can find something in the documentation for the loo function, particularly in the section “Methods (by class)”. The “foo.function method” probably corresponds to the third method described in this section, where the first argument to loo() is a function that (more or less) calculates log_lik[n].

Tom_C · August 26, 2019, 10:49am

Using the advice from @avehtari I can define a function llfun_logistic that takes in the explanatory variables as a model.matrix (data_i) and the posterior draws (draws) and, after matrix multiplication to obtain predicted y values, returns the log likelihood calculated by dbinom. This is then fed into the loo_i or the loo_subsample function.

mans_magnusson · August 28, 2019, 8:55am

Hi!

The llfun_logistic() is exactly the same as in ordinary loo. It takes in a row of data (if data is matrix - brms has another system) and returns a value. You can use loo_i()to test that your function works as expected for one observation.

Then you just use the loo_subsample() as is specified in the vignette. Did this help you out?

Topic		Replies	Views
Loo 2.0 with very large stanfit crashes R Modeling loo	6	1308	February 1, 2022
Extract log-likelihood from large size stanfit using the Loo package General loo	4	2284	June 7, 2018
How to calculate log_lik in generated quantities of two different process(models) running simultaneously in a longitudinal data Modeling rstan , loo	12	586	April 18, 2024
How to calculate log_lik in generated quantities of a multivariate regression model CmdStan cmdstan , loo	15	3725	March 9, 2022
How to compute log_lik in Stan file for computing LOOIC Modeling	3	283	October 17, 2020

Calculate log_lik from model using less memory

Related topics