Marginal predictions and cross-validation with latent variable model

Hi all,

I wanted to follow up on this a bit to check my understanding of the PSIS-loo results I’m getting for this model. I’ve extended the model above to take a matrix of predictors X, with correlated varying effects for each species for each predictor. At a quick count with 6 predictor variables (and no species-specific intercept), there are 209 parameters, which I understand is a lot compared to the number of data points (336).

Running model$loo() gives the following, with a good number of Pareto k values > 0.7 (the intercept only model above gives all points >0.7…!):

Computed from 4000 by 336 log-likelihood matrix

         Estimate   SE
elpd_loo   -674.7 34.5
p_loo        99.3  7.7
looic      1349.4 69.0
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     180   53.6%   208       
 (0.5, 0.7]   (ok)       103   30.7%   70        
   (0.7, 1]   (bad)       47   14.0%   13        
   (1, Inf)   (very bad)   6    1.8%   6         
See help('pareto-k-diagnostic') for details.

Having done a bit of searching around Discourse, I found @avehtari’s comment below which I think sums up what’s going on here:

Here, p_loo is about half the actual number of parameters, which is nice (and suggests some regularisation is happening), but the actual number of parameters is much bigger than n/5 (67.2), which I take to read that the model is highly flexible.

To test this, I added more strongly regularising priors on the variance components (going from exponential(1) to exponential(2)), which reduced the number of observations with Pareto k > 0.7, suggesting that the problems are to do with model flexibility:

Computed from 4000 by 336 log-likelihood matrix

         Estimate   SE
elpd_loo   -676.6 34.4
p_loo        98.7  7.7
looic      1353.1 68.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     174   51.8%   212       
 (0.5, 0.7]   (ok)       111   33.0%   46        
   (0.7, 1]   (bad)       49   14.6%   8         
   (1, Inf)   (very bad)   2    0.6%   12        
See help('pareto-k-diagnostic') for details.

However, increasing these rate parameters further in general seems a poor solution to the problem, as 1) increasing them to very high levels doesn’t remove high Pareto k values, and also results in high numbers of divergent transitions, and 2) highly constrained variances in general would in my mind not fall in line with domain-specific knowledge, where species often differ strongly in their responses to the environment - the model has to allow the flexibility to estimate these accurately.

Is my understanding correct that this is a problem of model flexibility? For a too-flexible model, can these problems with PSIS-LOO be side-stepped by using other cross-validation techniques (e.g. k-fold CV), or are these also invalid under these conditions? It would be good in general (outside of my solution above using a permanently held-out sample) to be able to compare the predictive ability of these models with different covariates and increasing complexity (an option I’d like to explore are phylogenetic varying effects), where the data perhaps don’t so easily permit a hold-out set.

1 Like