Loo_moment_match for jagsUI models

Hi everyone!
I know this forum is mostly dedicated to stan, but maybe you can help me out. I want to use loo for model ranking but my models are a jagsUI object. Also, all models are returning a warning for the pareto-k diagnostic.

In summary, I have 7 hierarchical models, each one with a single covariate. I have 115 samples and I am essentially using logistic regression to fit those models. As an output from JAGS I am storing the beta coefficients (matrix with columns as the number of coefficients and rows for each iteration) and the log likelihood (matrix with 115 columns and nrows=n_iterations).

This is the output of loo for most models

loo(model$sims.list$log.like, r_eff = NA)

Computed from 24000 by 115 log-likelihood matrix

          Estimate    SE
elpd_loo  1246.0    95.6
p_loo     228.3     26.6
looic     2491.9   191.3

Monte Carlo SE of elpd_loo is NA.

 Pareto k diagnostic values:
                       Count    Pct.  Min. n_eff
(-Inf, 0.5]   (good)     84    73.0%   1206      
 (0.5, 0.7]   (ok)       19    16.5%   272       
  (0.7, 1]   (bad)       6     5.2%   19        
  (1, Inf)   (very bad)  6     5.2%   2         
See help('pareto-k-diagnostic') for details.
Warning message:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

I checked the documentation and saw that loo_moment_match could be used to solve this issue. However, for JAGS models I would have to specify the functions for post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars. I know there was a similar question previously for how to specify this for rjags objects, however, since some time has pass I was wondering if there were any new developments and maybe someone has some example code of how to specify these functions.

Sorry if this is too of topic, but I haven’t found much information for model selection and these issues when using JAGS.

Cheers!

Hi @Ana_Barros, welcome to the Stan forum. Currently I think loo_moment_match would be difficult to use with a JAGS model, but maybe @topipa or @avehtari has an idea. But there are other options:

How long does it take to run each model? Depending on that you could maybe do exact calculations (instead of approximate PSIS-LOO) for the observations with high Pareto k (that would require fitting each of the models once per high Pareto value, leaving out the corresponding observation).

K-fold cross validation is another option. In the second half of this vignette by @bnicenboim there is an example of K-fold cross validation using Stan + loo package but in your case you could do it with JAGS + loo package following the same steps:

If you’re interested, we could get you up-and-running with one of the formula-based interfaces to Stan (brms or rstanarm) pretty quickly, which would allow you to do loo with moment matching more-or-less automatically, without any of that pesky need to wrangle unconstrained parameters manually.

1 Like

Thank you for the reply. I am trying the k-fold CV approach, but I need to adapt the code a bit because the models I am using are fairly complex. Hopefully it works out!

In this case I think it would be a lot of work to adapt the code I already have. I am using multi-species occupancy models, so although it uses logistic regression it gets a bit complicated with all the possible interactions and levels. But thank you so much anyway. Maybe in a future I need to rely more on STAN than JAGS.

Sounds good. Just as an aside, for using loo with occupancy models you almost certainly want make sure to form the likelihood at the level of sites (species-sites in multispecies models) rather than at the level of visits. For k-fold, make sure that the withholds are blocks of species-sites (rather than blocks of visits that potentially cut across species sites).

Additionally, you almost certainly want to compute the likelihood for a species-site in its marginalized form, i.e. not conditional on the fitted value of the latent occupancy state Z, but instead conditional only on the fitted occupancy probability \psi and the fitted detection probabilities \theta.

I don’t mean to suggest you aren’t already doing exactly this, but I’ve seen people get confused about this frequently enough. A bit more on this topic here: Occupancy model likelihoods: advanced topics

Finally, just a note that if an occupancy model does not contain detection covariates that vary across the repeated visits within a site (or species-site in the multispecies case), then the occupancy model is simply a zero-inflated binomial regression, which is supported natively in brms and would allow you to use moment matching easily.

2 Likes

Thank you so much for the advice! I guess I wasn’t calculating the likelihood properly. This improved the pareto-k diagnostics, although for most models I still get 2 or 3 locations above the 0.7 threshold. I guess it’s just too many possibilities of species combinations for the low sample size. Thanks once again!

1 Like