Loo::add_criterion() aborts R session for cmdstanr model

Hello everybody,

I am currently trying to add the loo-criterion to my models, but the session aborts every time I run the loo:add_criterion() function. As I am currently not able to fit my models with rstan (see GitHub-issue), I am using cmdstan as a backend. I suspect that the different backend could be an issue, or loo runs into the very same problem as rstan. Both have the same characteristic: the R session just aborts.

My model is the following, as output of brms::make_stancode():

functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // residual SD
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
  }
  // priors including constants
  target += normal_lpdf(b | 0, 10);
  target += normal_lpdf(Intercept | 80, 30);
  target += student_t_lpdf(sigma | 3, 0, 44.5)
    - 1 * student_t_lccdf(0 | 3, 0, 44.5);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

Does anybody have a clue what the underlying error could be? Really looking forward to your responses, and thank you in advance. If I should post any more information, please let me know.

This might not be anything, but I’ve had similar issues in RStudio when doing post-processing from a brms model. I’ve found that the issue is almost always resolved by specifying options(mc.cores = 1) after fitting a model and before requesting the post-processing. Not sure if that’s a fix specific to RStudio or not, though – just wanted to see if something as easy as that might fix your problem

Unfortunately this does not do the job. It aborts just the same as before, both with setting mc.cores while calling the package, and while setting it with the add_criterion/loo function.

This may be a brms issue, but I’m not sure (the add_criterion function is from brms not loo although it uses loo). One thing you can try is fitting the model without brms and doing loo without add_criterion:

library(brms)
library(cmdstanr)
code <- make_stancode(formula, data) # use whatever brms formula and data arguments you're using
data <- make_standata(formula, data)
mod <- cmdstan_model(write_stan_file(code))
fit <- mod$sample(data = data)
loo_results <- fit$loo()

If this works then the problem likely isn’t cmdstanr or loo itself but perhaps something to do with add_criterion. In that case I would open an issue/bug report on github for brms. If, on the other hand, you get an unexpected error when you do fit$loo() then I would open an issue/bug report on github for cmdstanr (if it turns out the problem is with the loo package itself we can transfer the issue from cmdstanr to loo).

Thanks for the reply, I will definitely try that out. Right now I have problems replicating the code you proposed. Can you help me with that as well?

> fit <- mod$sample(data = data)
Fehler: No method asJSON S3 class: standata

I guess I am missing something obvious, but what is it? But there is a further observation:
It seems that other, smaller models do work. With a subset of the data (30.000 instead of 600.000 observations) and using only one out of 26 populations (factor with many levels that cannot be broken down any further) might put some limitation on loo. I tried running loo(model) and it aborts the R session just the same as add_criterion. Do we conclude from this that I should post the problem to the cmdstanr github?

Oh sorry, can you try data = as.list(data)?

Hmm, I wonder if maybe you’re running out of memory. With your fitted brms model can you try

# this is assuming 'model' is a brmsfit object
loo(model, pointwise = TRUE) 

and see if that avoids crashing R? Setting pointwise=TRUE might slow things down but should use less memory.

Make sure you have the latest versions of brms and cmdstanr. See here

here

Oh sorry, can you try data = as.list(data) ?

This just leads to the same error somehow.

Make sure you have the latest versions of brms and cmdstanr. See here

I am using cmdstanr version 0.4.0 and brms 2.15, which, according to my knowledge, are the most recent ones.

and see if that avoids crashing R? Setting pointwise=TRUE might slow things down but should use less memory.

This works! So at least there is a temporary workaround, great, thank you very much! How do I proceed from here? Report the problem to the cmdstanr GitHub?

The issue is that make_standata from brms adds a class attribute and cmdstanr.
For me the following fixes it:

data <- make_standata(formula, data)
class(data) <- NULL

For me the following fixes it:

That worked! But only until the next step. When trying to access fit$loo(), I receive the following error:
Can't find the following variable(s) in the output: log_lik the error remains.

What am I missing in the function call? The log_lik is definitely part of the model:
formals(fit[["loo"]])[["variables"]] gives me “log_lik” as a result. But even when specifying variables for log_lik

So, the loo-call works with the pointwise call, and small models work without such modifications. Where could the error really have its roots? It’s a shame the R session just aborts and gives me no error message. I heavily feel the downside of not having learned enough about STAN yet and only interfacing through brms, thank you all for your patience leading me through this debugging process. I guess as a consequence I want to learn about more STAN itself.

Ah, I forgot about this problem, sorry! The issue is that to compute loo directly from a Stan program the model needs to calculate the log-likelihood in the generated quantities block (or transformed parameters). brms doesn’t do this, instead calculating the required log likelihood terms in R. So this error is just because the brms Stan program doesn’t include the log likelihood terms in the Stan program. I should have foreseen this but I forgot that brms does it this way.

formals() gives you the function arguments, not what’s in your model. So this just means that the default value of variables is log_lik (by default loo looks for a variable called log_lik), it doesn’t mean that log_lik is in your model.

Ok great, I’m glad that works! My guess is that this means that the problem is that you’re running out of memory computing loo without using the pointwise argument. So I don’t think there’s a bug, you’ll just need to keep using the pointwise argument when you have a ton of observation in the data (the more observations in the data the more memory intensive the loo computation are).

How would I be able to do this then? I would still like to check if maybe the issue is only due to brms, as I had the similar problem with rstan as a backend as well. So maybe it will work when running everything directly with STAN.

If not, however, thanks a lot for your time and answers. I am using pointwise = True for now and it works smoothly. Compared to the model, it doesn’t actually take that long either.

1 Like

You could modify the Stan code that brms creates and add the log likelihood computations in the generated quantities block. I haven’t tested it but judging from the Stan code you shared in the original post it looks like you could add something like this in generated quantities:

  vector[N] log_lik; 
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(Y[n] | Intercept + Xc[n] * b, sigma);
  }

And here’s a vignette from the loo package that discusses how to write Stan programs compatible with the loo package: