I have a model written in `cmdstanr`

with C clusters. My aim is to fit the model for various values of C and select the number of clusters that maximises the model performance. For MCMC fitted models, I would lean towards using elpd (from the `loo`

package), but since I am fitting a mixture model I decided to fit it via VI instead.

In my model, I specify:

- the base parameters of my model
- a transformed set of parameters
- generated quantities for post-posterior checks. This block also includes generated quantities of the log likelihood, labelled as
`log_lik`

.

Following some searches online and reading this previous post I tried to approximate the loo posterior using the following snippet (with my VI fitted model being called `fit`

, and the input data labelled `data_list`

):

```
log_p <- fit$lp()
log_g <- fit$lp_approx()
loo_approximate_posterior(fit$draws('log_lik'),
draws = as_draws_matrix(fit$draws()),
data=data_list,
log_p = log_p,
log_g = log_g,
cores = 4)
```

The output for the function had attrocious Pareto k diagnostics, which surprised me as a glance at the PPC suggested that I got some decent results:

```
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 20570 100.0% 1
See help('pareto-k-diagnostic') for details.
```

Am I doing the right thing here? Does the llhood need to be calculated outside the `draws`

object like the author of the linked post does? Is there an alternative approach that works?

Any help would be much appreciated!