Calculate log_lik from model using less memory

To calculate the log likelihood from a model, I’m aware of this method in the generated quantities block:

vector[N] log_lik;

for (n in 1:N) log_lik[n] = bernoulli_lpmf(y[n] | p[n]);

Where the log_lik quantity can then be used in the loo package. However, this is very memory intensive. In my case, I’m using a model where N=50000 and have 2000 posterior draws after thinning. The inclusion of log_lik in generated quantities increases the storage size of the rstanfit object from 3Mb when only the intercept and coefficients are extracted to 1.5Gb. As the chains are thinned by a factor of four, there is presumably ~6Gb in RAM during model fitting.

Is there a less memory intense method? A thread mentions the “foo.function method”, but I can’t find an explanation of this:

The same thread also mentions only taking every nth observation, presumably:

int M; M = N/100; vector[M] log_lik

for (n in 1:N) if(n%%100==0){ log_lik[(n/100)] = bernoulli_lpmf(y[n] | p[n])};

What is recommended?

In addition of using function approach, you would benefit from subsampling loo.
See the paper http://proceedings.mlr.press/v97/magnusson19a.html
and the vignette
https://github.com/stan-dev/loo/blob/master/vignettes/loo2-large-data.Rmd
Currently need to install loo from github to get the subsampling loo (will get to CRAN later)

if (!require(devtools)) install.packages("devtools")
devtools::install_github("stan-dev/loo")

I think you can find something in the documentation for the loo function, particularly in the section “Methods (by class)”. The “foo.function method” probably corresponds to the third method described in this section, where the first argument to loo() is a function that (more or less) calculates log_lik[n].

1 Like

Using the advice from @avehtari I can define a function llfun_logistic that takes in the explanatory variables as a model.matrix (data_i) and the posterior draws (draws) and, after matrix multiplication to obtain predicted y values, returns the log likelihood calculated by dbinom. This is then fed into the loo_i or the loo_subsample function.

Hi!

The llfun_logistic() is exactly the same as in ordinary loo. It takes in a row of data (if data is matrix - brms has another system) and returns a value. You can use loo_i()to test that your function works as expected for one observation.

Then you just use the loo_subsample() as is specified in the vignette. Did this help you out?