I have a model written in
cmdstanr with C clusters. My aim is to fit the model for various values of C and select the number of clusters that maximises the model performance. For MCMC fitted models, I would lean towards using elpd (from the
loo package), but since I am fitting a mixture model I decided to fit it via VI instead.
In my model, I specify:
- the base parameters of my model
- a transformed set of parameters
- generated quantities for post-posterior checks. This block also includes generated quantities of the log likelihood, labelled as
Following some searches online and reading this previous post I tried to approximate the loo posterior using the following snippet (with my VI fitted model being called
fit, and the input data labelled
log_p <- fit$lp() log_g <- fit$lp_approx() loo_approximate_posterior(fit$draws('log_lik'), draws = as_draws_matrix(fit$draws()), data=data_list, log_p = log_p, log_g = log_g, cores = 4)
The output for the function had attrocious Pareto k diagnostics, which surprised me as a glance at the PPC suggested that I got some decent results:
Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 0 0.0% <NA> (0.5, 0.7] (ok) 0 0.0% <NA> (0.7, 1] (bad) 0 0.0% <NA> (1, Inf) (very bad) 20570 100.0% 1 See help('pareto-k-diagnostic') for details.
Am I doing the right thing here? Does the llhood need to be calculated outside the
draws object like the author of the linked post does? Is there an alternative approach that works?
Any help would be much appreciated!