I saw that the loo v.2.0.0 has built-in ways to proceed when loo gives warnings about high k values beyond k_threshold, however this loo version only applies to a Stanreg object. I used the rstan package to write up my own model, which produces a stanfit object. Currently, I do the following to obtain loo:
log_lik <- rstan::extract(fit,"log_lik_marg") # fit: a stanfit object; log_lik_marg: the marginal log likelihood
looic <- loo::loo(log_lik)
loo(log_lik, save_psis=TRUE, k_treshold=0.7)
The results look like this:
Computed from 1500 by 48 log-likelihood matrix
Estimate SE
elpd_loo -534.1 44.1
p_loo 32.4 7.4
looic 1068.1 88.2
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 40 83.3% 251
(0.5, 0.7] (ok) 6 12.5% 61
(0.7, 1] (bad) 1 2.1% 67
(1, Inf) (very bad) 1 2.1% 6
See help('pareto-k-diagnostic') for details.
I could also use loo v.2.0.0 by using loo(fit), which however gives completely different (worse) results:
Computed from 1500 by 48 log-likelihood matrix
Estimate SE
elpd_loo -434.1 37.5
p_loo 123.7 9.9
looic 868.2 75.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 3 6.2% 43
(0.7, 1] (bad) 25 52.1% 10
(1, Inf) (very bad) 20 41.7% 2
See help('pareto-k-diagnostic') for details.
Interestingly, I found that loo(fit) produces similar results if I extract the conditional log likelihood.
log_lik_con <- rstan::extract(fit,"log_lik")[[1]] # similar to loo(fit) results
loo(log_lik_con)
Computed from 1500 by 48 log-likelihood matrix
Estimate SE
elpd_loo -433.9 37.5
p_loo 123.6 9.9
looic 867.9 75.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 3 6.2% 80
(0.7, 1] (bad) 25 52.1% 18
(1, Inf) (very bad) 20 41.7% 2
See help('pareto-k-diagnostic') for details.
My first questions is that whether anyone have any insight on why the results look so? And which result should I trust? My second question is how I can call loo to refit the model J times when there are J observations with high k estimates. I tried loo(log_lik, save_psis=TRUE, k_treshold=0.7) or loo(fit, save_psis = TRUE, k_threshold = 0.7), but nothing happened.
Below are the details of computing marginal & conditional log likelihood:
log_lik_marg[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | zero_v[subject_starts[n]: subject_stops[n]],
to_matrix(Sigma[cov_starts[n]: cov_stops[n]], V[n], V[n]));
log_lik[n] = multi_normal_lpdf(Y[subject_starts[n]: subject_stops[n]] | B[subject_starts[n]: subject_stops[n]]*theta_mu +
B[subject_starts[n]: subject_stops[n]] * Theta * alpha[n], diag_matrix(rep_vector(sigma_eps^2,V[n])));
Thanks!