Hello,
The approach I have tried so far is based on the assumption that the observed ordinal data y is generated from an underlying latent multivariate normal distribution. Hence, the Stan model simulates a multivariate normal distribution (using a LKJ prior) and uses an ordered probit link to model the ordinal outcomes.
Unfortunately, I cannot recover the correct correlation matrix from simulated data using the following model:
functions {
// for probit approach
vector calc_theta(real eta, vector cp, int K) {
vector[K] theta;
theta[1] = 1 - Phi(eta - cp[1]);
for (j in 2:(K-1))
theta[j] = Phi(eta - cp[j-1]) - Phi(eta - cp[j]);
theta[K] = Phi(eta - cp[K-1]);
return theta;
}
}
data {
int N; // number of cases
int K; // number of variables
int nlevels; // number of levels for oridnal outcomes
int y[N,K]; // responses on ordinal scale
}
transformed data {
vector[K] zeros; // mean of underlying multivariate normal
vector[K] L_sigma; // sd of underlying multivariate normal
zeros = rep_vector(0,K);
L_sigma = rep_vector(1,K);
}
parameters {
cholesky_factor_corr[K] L_Omega;
vector[K] y_cont[N]; // continuous "trait" assumed to underly
// responses on ordinal scale
ordered[nlevels - 1] cp[K]; // cut points for ordered probit
}
model {
L_Omega ~ lkj_corr_cholesky(2);
y_cont ~ multi_normal_cholesky(zeros,L_Omega);
for (k in 1:K) {
cp[k] ~ normal(0,3);
for (n in 1:N) {
y[n][k] ~ categorical(calc_theta(y_cont[n][k], cp[k], nlevels));
}
}
}
generated quantities {
corr_matrix[K] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}
Instead, I get the warning that the “estimated Bayesian Fraction of Missing Information (bfmi) was low”. The entries in the Cholesky factor of the correlation matrix are negatively correlated with the energy__ (strongest r = -.36). I also found that large correlations between data-columns resulted in highly correlated posteriors of the simulated continuous traits underlying these variables. However, these checks did not provide me with insights about how I should change the model.
I do not think that what is going on here is shrinkage towards zero for off-diagonal values of the correlation matrix due to LKJ priors, because I can estimate correlations for continuous variables just fine when I use the same prior.
Does anybody have an idea what I am doing wrong here?
Thanks in advance!
Guido