Obtaining log likelihood for mixed effects models

After fitting a generalized linear mixed effects model (logistic), I wanted to measure its model performance using loo. To calculate loo, I need to provide log likelihood as the output for the loo function.

I saw that on tutorials, people have used conditional log likelihood (Conditioned on random effects) as follows:

generated quantities {
  vector[N] log_lik;
  for(i in 1:N){
    log_lik[i]=bernoulli_logit_lpmf(y1[i] | alpha1+ x1[i] * beta +u[i] );

u \sim N(0, \sigma^2_u).

Instead of using conditional log likelihood, is there a way to provide the marginal log likelihood ( integrating out the random effects based on the posterior distribution of \sigma^2_u). ?

I don’t think there is a simple way to do this during model estimation, because you need to approximate the integral in some way. Below is a paper describing the issue, along with a quadrature method that is carried out after model estimation. The method makes use of the posterior estimates of the random effects, so it can’t be included in generated quantities during model estimation. Instead, you could compute log-likelihood values after model estimation, then send those values to loo().

1 Like

Hi @edm Thank you for your answer. I will read the paper you attached. I am familiar with gaussian quadrature. When I need to provide the log likelihood to loo, it has to be in a certain format. I know it is in that required format when I do it through Stan. However I was not sure whether the log likelihood is in that same format, when I calculate it outside of Stan (by integrating out random effects).

You can arrange the log-likelihoods in an array or matrix. From ?loo:

    • ‘array’: An I by C by N array, where I is the number of MCMC
      iterations per chain, C is the number of chains, and N is the
      number of data points.

    • ‘matrix’: An S by N matrix, where S is the size of the
      posterior sample (with all chains merged) and N is the number
      of data points.
1 Like