Continuing the discussion from Feedback request: Multivariate Probit Regression with GP:
Brief problem summary: @bgoodri has developed this really handy implementation of the multivariate probit (code below). Unfortunately, it universally gets terrible k-hats in LOO. @martinmodrak made the excellent point in another thread and I wanted to continue discussion here to see if this is a viable way to get a robust LOO estimate (or whether this model is simply a poor candidate for LOO):
Pareto-K is almost guaranteed to be bad if you just add up the contributions to
target
from this parametrization. In this case, you donāt compute the log likelihood of the observed values given the linear predictors. You actually compute the log likelihood given the linear predictors AND the nuisance parameters. Since the observed values have huge influence on the associated nuisance parameters, loo correctly treats them as having large influence on the model and thus having high k-hat.
Martin goes on to suggest:
It could also make some kind of weird sense to compute the multivariate normal log likelihood of the nuisance parameters given the linear predictors and feed that to loo, but I canāt think completely clearly if that would correspond to a meaningful quantity or not.
I can very much see the logic behind this, but I donāt understand either LOO or the MVProbit implementation sufficiently to determine whether such a version of LOO would make sense (maybe @avehtari could comment?).
Thanks in advance!
Here is @bgoodriās code (taken from https://github.com/stan-dev/example-models/blob/master/misc/multivariate-probit/probit-multi-good.stan):
data {
int<lower=1> K;
int<lower=1> D;
int<lower=0> N;
array[N, D] int<lower=0, upper=1> y;
array[N] vector[K] x;
}
parameters {
matrix[D, K] beta;
cholesky_factor_corr[D] L_Omega;
array[N, D] real<lower=0, upper=1> u; // nuisance that absorbs inequality constraints
}
model {
L_Omega ~ lkj_corr_cholesky(4);
to_vector(beta) ~ normal(0, 5);
// implicit: u is iid standard uniform a priori
{
// likelihood
for (n in 1 : N) {
vector[D] mu;
vector[D] z;
real prev;
mu = beta * x[n];
prev = 0;
for (d in 1 : D) {
// Phi and inv_Phi may overflow and / or be numerically inaccurate
real bound; // threshold at which utility = 0
bound = Phi(-(mu[d] + prev) / L_Omega[d, d]);
if (y[n, d] == 1) {
real t;
t = bound + (1 - bound) * u[n, d];
z[d] = inv_Phi(t); // implies utility is positive
target += log1m(bound); // Jacobian adjustment
} else {
real t;
t = bound * u[n, d];
z[d] = inv_Phi(t); // implies utility is negative
target += log(bound); // Jacobian adjustment
}
if (d < D) {
prev = L_Omega[d + 1, 1 : d] * head(z, d);
}
// Jacobian adjustments imply z is truncated standard normal
// thus utility --- mu + L_Omega * z --- is truncated multivariate normal
}
}
}
}
generated quantities {
corr_matrix[D] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}