Problem with loo function for model comparison

Andrew_fg · February 27, 2023, 8:25am

Hello there, I am going to compare different models specifically,

i) one model estimated using brms,

fit_brm <- brm(y ~ x,...)

ii) and an additional one using classical stan function i.e.

rstan_model=stan("model.stan",model_code=stan_mod, data=data...)

obtaining the stan script leveraging the brms function make_stancode() and modifying it.

When I am trying to compare models using loo function, it works for brms object (fit_brm). Nevertheless, I got this message when I use the function on rstan_model.

loo(rstan_model)
Error in check_pars(allpars, pars) : no parameter log_lik

Any cue or suggestion?

avehtari · February 28, 2023, 1:26pm

See Writing Stan programs for use with the loo package • loo

Andrew_fg · February 28, 2023, 2:24pm

Thanks for your reply, unfortunately I get the same error

> log_lik_1 <- extract_log_lik(rstan_model, merge_chains = FALSE)
Error in check_pars(allpars, pars) : no parameter log_lik

avehtari · February 28, 2023, 2:33pm

Did you follow the instructions to add that log_lik computation in generated quantities block?

Andrew_fg · February 28, 2023, 4:24pm

Thanks for your support. Nevertheless, I am working in a Dirichlet framework with no intercept, so it is quite difficult for me. Here is the Stan code and a data simulation.

Hope you can help me, I would really benefit from that

Stan code:

// generated with brms 2.18.0
functions {
  /* dirichlet-logit log-PDF
   * Args:
   *   y: vector of real response values
   *   mu: vector of category logit probabilities
   *   phi: precision parameter
   * Returns:
   *   a scalar to be added to the log posterior
   */
   real dirichlet_logit_lpdf(vector y, vector mu, real phi) {
     return dirichlet_lpdf(y | softmax(mu) * phi);
   }
}
data {
  int<lower=1> N;  // total number of observations
  int<lower=2> ncat;  // number of categories
  vector[ncat] Y[N];  // response array
  int<lower=1> K_muy2;  // number of population-level effects
  matrix[N, K_muy2] X_muy2;  // population-level design matrix
  int<lower=1> K_muy3;  // number of population-level effects
  matrix[N, K_muy3] X_muy3;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K_muy2] b_muy2;  // population-level effects
  vector[K_muy3] b_muy3;  // population-level effects
  real<lower=0> phi;  // precision parameter
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += gamma_lpdf(phi | 0.01, 0.01);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] muy2 = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] muy3 = rep_vector(0.0, N);
    // linear predictor matrix
    vector[ncat] mu[N];
    muy2 += X_muy2 * b_muy2;
    muy3 += X_muy3 * b_muy3;
    for (n in 1:N) {
      mu[n] = transpose([0, muy2[n], muy3[n]]);
    }
    for (n in 1:N) {
      target += dirichlet_logit_lpdf(Y[n] | mu[n], phi);
    }
  }
  // priors including constants
  target += lprior;
}
generated quantities {
}

Data simulationsing brms:

#install.packages("brms")
library(brms)
library(rstan)
library(dplyr)

bind <- function(...) cbind(...)

#Data simulation
N <- length(2000:2009)*length(1:10)
df <- data.frame(
  y1 = rbinom(N, 10, 0.5), y2 = rbinom(N, 10, 0.7), 
  y3 = rbinom(N, 10, 0.9), Age = as.factor(1:10),Year = as.factor(rep(2000:2009,each=length(1:10)))
) %>%
  mutate(
    size = y1 + y2 + y3,
    y1 = y1 / size,
    y2 = y2 / size,
    y3 = y3 / size
  )
df$y <- with(df, cbind(y1, y2, y3))
str(df)


stan_mod <- make_stancode(bind(y1, y2, y3) ~ 0+Age+Year, df, dirichlet(),save_model = "example_model.stan")
brms_standata <- make_standata(bind(y1, y2, y3) ~ 0+Age+Year, df, dirichlet(),save_model = "data_exmple.stan")

# design matrix 
brms_standata$X_muy2
brms_standata$X_muy3

stan_mod_a <- stan_model(file="example_model.stan")
rstan_model = stan("data_exmple.stan",model_code=stan_mod_a,
                   data=brms_standata)

avehtari · February 28, 2023, 5:25pm

This line in the model block computes the log likelihood

You just copy that and collect the outputs. Although, you need to repeat also the computation for mu unless you move that to the transformed parameters. So it should look something like this

generated quantities {
    vector[N] muy2;
    vector[N] muy3;
    vector[ncat] mu[N];
    vector[N] log_lik;
    muy2 = X_muy2 * b_muy2;
    muy3 = X_muy3 * b_muy3;
    for (n in 1:N) {
      mu[n] = transpose([0, muy2[n], muy3[n]]);
      lok_lik[n] = dirichlet_logit_lpdf(Y[n] | mu[n], phi);
    }
}

Sorry, I didn’t have time to test that it runs, but I think it should.

Topic		Replies	Views
Model comparison (LOO?) from cmdstanr objects CmdStan loo , cmdstanr	4	444	February 8, 2024
Stan, bym2 RStan rstan , specification , loo , brms	2	415	December 14, 2022
Error in extract_log_lik from the loo package RStan rstan , loo	4	1476	June 25, 2021
(brms) Loo pit overlay differences when mu and sigma are in the transformed parameters block and loglik/yrep are in the generated quantities brms loo	2	47	October 5, 2024
Using BRMS Functionality for Data Generated in CmdStan brms	4	1343	September 5, 2019

Problem with loo function for model comparison

Related topics