Hi all.
I am developing [another] R package with Stan as the est. backend. As always, I am trying to implement a loo method for it. The model is a latent-multivariate mixed effects location scale model (LMMELSM). The details of the model aren’t terribly important - It’s estimating fine, recovering parameters, no divergences, good Rhats, etc. I am not going to paste the model code here, because it is complicated, and not particularly relevant to the problem.
The model is hierarchical, and involves latent variables.
Let J be the number of observed variables, and Y_i is the data matrix for person i.
Let n_i be the number of rows of observations for person i.
Let F be the number of latent factors (\eta), and k be the k-th observation of person i.
\eta_i is hierarchically modeled; e.g., in the case where F = 1 [a unidimensional latent structure]:
In essence, it is a location-scale model, but on a latent quantity instead of observed data; the latent quantity then projects to the observed data as per normal in a latent measurement model.
For loo, there can be several things ‘left out’.
The simplest, is just leave-row-out; in this case, the log_lik of each row is computed — Get log_lik of each score on each row; sum across the columns for the row-wise log_lik.
Another variant would be leave-person-out; in this case, the log_lik of each person is computed — Get log_lik of each score on each row; sum across the columns for the row-wise log_lik; then sum these values up per person. If you have K persons, then you have K log_liks.
The leave-row-out yields generally “ok” pareto-k values; sometimes there are 1 or 2 ‘bad’ k values. Unfortunately, I don’t think this would be the LOO of interest.
I would intuitively think that the LOO of interest is leave-group-out (akin to how leave-group-out may be more useful for mixed models in general).
But the leave-person-out pareto-k values are, and I am not exaggerating, 100% bad k-values.
Is there a reason for how 100% of the PSIS k-values can be bad? Am I computing LGO properly for a mixed, multivariate-outcome model (i.e., compute the log_lik for each multivariate occasion by summing the log_liks across outcomes; then sum these up within person to yield a joint-log-lik for a person’s observed data)? Is there a way of diagnosing why these k-values are so, so bad?