Thanks for posting the questions. It seems we should clarify the documentation a bit related to convergence diagnostics and loo
See also loo-glossary
From the glossary
- If
p_loo > p, then the model is likely to be badly misspecified. If the number of parametersp<<N, then PPCs are also likely to detect the problem. See the case study at Roaches cross-validation demo for an example. Ifpis relatively large compared to the number of observations, sayp>N/5(more accurately we should count number of observations influencing each parameter as in hierarchical models some groups may have few observations and other groups many), it is possible that PPCs won’t detect the problem.
You have p_loo=285 > p=262.
From the glossary
If
k>0.7, then importance sampling is not able to provide useful estimate for that component/observation. Pareto k is also useful as a measure of influence of an observation. Highly influential observations have high k values. Very high k values often indicate model misspecification, outliers or mistakes in data processing. See Section 6 of Gabry et al. (2019) for an example.
You have several k>0.7, that is, importance sampling is failing as the full posterior and leave-one-out posteriors are too different.
It is likely that the problem is now mostly in importance sampling and not in MCMC.
That applies mostly to MCMC to make it more likely that Rhat and n_eff computations are reliable. You could compute Rhats and n_eff’s for exp(log_lik) (see Convenience function for computing relative efficiencies — relative_eff • loo) if you think you have a problem with MCMC sampling.
Before using loo, it is recommended that you have checked that sampling works with Rhat, n_eff, divergences, E-BMFI, etc. loo checks only the combined n_eff and khat, but if combined n_eff’s are large and Pareto k’s are small, there is no need to check Rhat for each exp(log_lik) separately.
For discrete models, elpd_loo can be interpreted as log probabilities. For continuous models elpd_loo can be compared to baseline model. Large SE indicates problems. If Monte Carlo SE of elpd_loo is NA, then the result is very unreliable.
It can be used and it works sometimes, but if you have latent variable model with n latent variables, it seems that in your case you would need to marginalize in order to get reliable result. If you are using the latent variables just to add overdispersion, consider to use instead overdispersed observation model.
I’m not familiar with blavaan
Yes.