Loo performance (large k-pareto and p_loo) with linear random effects model on longitudinal data


I am trying to use the “loo” package, including both waic() and loo(), to compare different functional forms (e.g., linear vs. quadratic) on longitudinal data. I am using “rjags” to fit the model (3 chains, 30K burn-in, 30K sampling, thin = 15) and then extracting the array of log density values to use in loo (my code to do this is based on this blog post).

To test the translation from rjags to loo (which is integrated with Stan), I simulated data from a linear random effects model (LREM, equation below). The data included 200 individuals with 7 timepoints, which led to 1400 total observations. Then, I fit the true model to the data and the parameters recovered well.

Y_{ij}=\beta_{0i} + \beta_{1i} + e_{ij}

If you are unfamiliar with this model, each regression parameter is assumed to be random so it includes a fixed effect (mean parameter, across individuals) and each individual has a random effect (denoted by subscript i):

\beta_{0i} = \beta_0 + b_{0i} \\ \beta_{1i} = \beta_1 + b_{1i}

When I use waic() I get the following output:

> LREM_waic <- waic(LREM_loglik)
Warning message:
173 (12.4%) p_waic estimates greater than 0.4. We recommend trying loo instead.

> LREM_waic
Computed from 6000 by 1400 log-likelihood matrix

          Estimate   SE
elpd_waic  -5062.1 25.6
p_waic       259.3  8.8
waic       10124.1 51.2

173 (12.4%) p_waic estimates greater than 0.4. We recommend trying loo instead.

When I use loo() (with r_eff) I get the following output:

> LREM_loo <- loo(LREM_loglik, r_eff = LREM_r_eff, cores=5)
Warning message:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

> LREM_loo
Computed from 6000 by 1400 log-likelihood matrix

         Estimate   SE
elpd_loo  -5087.7 26.3
p_loo       284.9 10.2
looic     10175.4 52.7
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     1220  87.1%   695       
 (0.5, 0.7]   (ok)        136   9.7%   198       
   (0.7, 1]   (bad)        43   3.1%   25        
   (1, Inf)   (very bad)    1   0.1%   28        
See help('pareto-k-diagnostic') for details.

Based on previous posts (e.g., Fitting the true model and getting bad k-pareto from loo, Reinforcement learning model) and the loo package glossary, I know it is possible to get large k-hat values even when fitting the true model if my model is “flexible.” However, I am 1) not sure whether a LREM is considered flexible, and 2) concerned about the large value for my effective number of parameters (p_waic, p_loo).

So, my questions are:

  1. From a Bayesian perspective, what is the number of parameters in my model (p)?
  • 6 → 2 fixed effects, 2 random effect variances, 1 random effect covariance, 1 error variance… OR…
  • 6 + 400 → 200 individuals with random effects for 2 parameters - intercept and slope. In this case, I could see why my model would have issues based on the loo package glossary (p = 406 > N/5 = 1400/5 = 280).
  1. Based on the previous answer, should I be concerned about the large p_waic and p_loo values?
  2. Given all this, is there a way to improve the model to make it more usable with loo? For example, would more informative priors improve results (Vehtari et al., 2017)? My other theory was to increase the sampling iterations, but this this post (Good PP check and R square but large Pareto k values) leads me to think that won’t make a meaningful difference.
  3. I know PSIS-LOO+ has also been mentioned as an alternative to loo for Stan. But I’m guessing a version of this does not exist for rjags?

I am also planning on using k-fold CV, but I am hoping there is still a way to use loo for my analysis to save computation time moving forward.

Any help is appreciated! Thank you!

It is flexible as you have two parameters for each individual. In total you have more than 400 parameters (would need to see the model code for the exact number). The effective number of parameters is estimated with p_loo to be about 285 which is quite high compared to the total number of parameters. It seems your priors on the “random effects” are very weak, or if you are using a hierarchical model, the data generating process has very wide population distribution.

It’s the total number of parameters. In addition, we can estimate the effective number of parameters which takes into account the constraining effect of priors and different dependencies between parameters.



Yes, thaat would improve the model, too. If you want to evaluate also the weak prior model, you can use brute-force, k-fold-cv, or integrated-LOO (for this one see Roaches cross-validation demo)

Probably would require a huge or infinite number of iterations

I don’t know what you mean by + here. There is a moment matching LOO approach that requires that we can re-evaluate the log density with transformed parameters, and we don’t have support for JAGS.