Estimated p_loo > p

rstanarm 2.18.2
loo.1.0
Windows

Using rstanarm I obtained a stan_lm model with 37 predictors and 3 auxiliary parameters, N = 1104358. The R2 prior was 0.5.

The loo object returns:

Computed from 16000 by 1104358 log-likelihood matrix

       Estimate     SE

elpd_loo -1090026.8 995.6
p_loo 41.6 0.1
looic 2180053.6 1991.1

Monte Carlo SE of elpd_loo is 0.1.

All Pareto k estimates are good (k < 0.5).

The largest k value was 0.136.

The loo-glossary states that for p_loo > p, the model has weak predictive capability and may indicate model mispecification.

How concerned should I be?

Are there other relevant loo model checks?

Nathan

I would upgrade to loo 2.x. You can do all the regular posterior predictive checks / plots, but it is going to be unwieldy and memory-intensive with over a million observations.

I incorrectly entered the loo version.

I am using 2.1.0.

Nathan

With that model and N very much bigger than the number of parameters, you don’t need loo for model comparison.

I should probably chance that to p_loo > p +1, as p_loo can in this kind of cases be by small amount larger than p. I guess that in addition of 37 predictors and 3 auxiliary parameters there is also intercept term, so that p=41.

No need for concern.

btw. there will be soon a new loo version which will compute loo for this at 1000 times faster using ideas presented in Bayesian leave-one-out cross-validation for large data

Aki, I appreciate your comments. I have been reading (heavy going for me) your recent papers on importance sampling. I have also read Gelman, Carlinf (2013) on the derivation of effective parameter estimates. Still don’t get why the lppd informs one about the parameter count. I’ll keep reading.

The 37 predictors included the intercept, but I have stopped being concerned.

I will read your new paper about loo for large data. I can run the current model in 20-30 minutes. Running loo takes 5+ longer. There is an eta for the next loo package update?

Not certain about CRAN relase, but at least github version with couple exciting new features hopefully before the end of May (but I seem to be often a bit optimistic).