Information criteria for multivariate probit


I’m working on a multivariate probit and thinking about model comparison using information criteria (LOO, maybe WAIC).

In past collaborations – using WinBUGS – calculating likelihoods for multivariate probit models was rather dodgy, giving results that were opposite from the expected.

Using Rstan, is there reason to think that LOOIC would provide principled insights?

P.S. Model code and simulations are a work in progress. I will provide an update when they have reached a shareable state.

If your data + model is such that there are several observations each of which can belong to several categories modeled with the multivariate probit, then leave-one-out cross-validation is valid and easy to implement in Stan. Without seeing the code you have used before I can’t say anything about why you got unexpected results. PSIS-LOO is recommended as it has better diagnostics than WAIC. If the diagnostics indicate failure of PSIS-LOO, then WAIC will fail, too, and you need to use k-fold-CV. I also recommend to make some posterior predictive probability calibration plots (a form of posterior predictive checking).


Thanks, Aki.

For starters, can we assume a model like the multivariate probit described on page 160 of the rstan manual?

model {
L_Omega ~ lkj_corr_cholesky(4);
to_vector(beta) ~ normal(0, 5);
vector[D] beta_x[N];
for (n in 1:N)
beta_x[n] = beta * x[n];
z ~ multi_normal_cholesky(beta_x, L_Omega);

Would that parameterization lend itself to LOOIC comparisons?

Can you add declaration of z?

z is the latent real value:

parameters {
matrix[D, K] beta;
cholesky_factor_corr[D] L_Omega;
vector<lower=0>[N_pos] z_pos;
vector<upper=0>[N_neg] z_neg;

transformed parameters {
vector[D] z[N];
for (n in 1:N_pos)
z[n_pos[n], d_pos[n]] = z_pos[n];
for (n in 1:N_neg)
z[n_neg[n], d_neg[n]] = z_neg[n];

And as I assumed I don’t understand this without seeing the data block, too…

It’s a tricky one, Aki. The parameterization of the multivariate probit in the manual is a way to circumvent some of the challenges that typify a more conventional approach. Here’s the data block (though the manual provides some important background):

functions {
int sum(int[,] a) {
int s = 0;
for (i in 1:size(a))
s += sum(a[i]);
return s;
data {
int<lower=1> K; 
int<lower=1> D;  
int<lower=0> N; //Number of observations
int<lower=0,upper=1> y[N,D];
vector[K] x[N];
transformed data {
int<lower=0> N_pos;
int<lower=1,upper=N> n_pos[sum(y)];
int<lower=1,upper=D> d_pos[size(n_pos)];
int<lower=0> N_neg;
int<lower=1,upper=N> n_neg[(N * D) - size(n_pos)];
int<lower=1,upper=D> d_neg[size(n_neg)];
N_pos = size(n_pos);
N_neg = size(n_neg);
int i;
int j;
i = 1;
j = 1;
for (n in 1:N) {
for (d in 1:D) {
if (y[n,d] == 1) {
n_pos[i] = n;
d_pos[i] = d;
i += 1;
} else {
n_neg[j] = n;
d_neg[j] = d;
j += 1;

I agree that this is tricky

The manual is missing the important part, that is, how z or beta_x' andL_Omega` are used to compute the class probabilities. If you add that part then, it would be possible to see what generated quantities needs.

Note also that has stan code which has a comment that it would have a better parameterization than what is in manual

edit: added beta_x' andL_Omega`