# Unable to recover latent covariance in multivariate multinomial model

EDIT: Just noticed I’m not using the covariances anywhere in the model… still, I haven’t figured out how to add that just yet, any suggestions on how to do that are welcome (currently trying to understand the multivariate probit regression example in the manual, feel this might be the right way)

I am building a model that will be used to analyze 13 Likert items that supposedly measure the same latent construct.

For now, I just have 2 simulated items. The observations are produced by taking samples from a multivariate normal and then cutting at arbitrary points. I’ve set a high correlation (0.9) between both items, and the simulated data reflects this:

``````     1  2  3  4
1  4  3  0  0
2  0  3  0  0
3  0 15  4  3
4  0  1  3 14
``````

The model runs without issue* and it recovers most parameters (latent mean, position of cutpoints, variance) just fine, but the covariance remains at zero.

Here’s the model:

``````data {
int<lower=0> N;        // # obs
int<lower=2> K;        // # item options
int<lower=0> D;        // # items
int y[N, D];           // answers
}
parameters {
ordered[K-3] ci[D];       // unknown cutpoints
vector[D] mu;             // latent means
cov_matrix[D] Sigma;      // latent covariance
}
transformed parameters{
vector[K-1] c[D];
for(d in 1:D){
c[d, 1] = 1.5;
c[d, K-1] = K-0.5;
for(k in 1:(K-3)){
c[d, 1+k] = ci[d, k];
}
}
}
model {
matrix[N, K] p[D];      // item probabilities

//priors
mu ~ normal(rep_vector(0.5*(K+1), D), K*0.5);
for(d in 1:D){
for (k in 1:(K-3)){
ci[d,k] ~ normal(1.5 + k, 1);
}
}

// calculate probabilities
for(d in 1:D){
for(n in 1:N){
p[d,n,1] = Phi((c[d, 1] - mu[d])/diagonal(Sigma)[d]);
p[d,n,K] = 1 - Phi((c[d, K-1] - mu[d])/diagonal(Sigma)[d]);
for(k in 3:K){
p[d,n,k-1] = Phi((c[d, k-1] - mu[d])/diagonal(Sigma)[d]) - Phi((c[d, k-2] - mu[d])/diagonal(Sigma)[d]);
}
y[n,d] ~ categorical(p[d,n]');
}
}
}
``````

* I get divergent transitions when the sample size is small and the correlation low, not sure if that’s something to worry about just yet

You want to use Cholesky factors. We recommend scaling Cholesky factors of correlation matrices and describe that approach in the regression chapter of the manual.

That model should blow up with an unconstrained covariance matrix.

If you’re OK with logistic, we have a built-in ordinal logistic. We also have a `Phi_approx` function that uses a logistic approximation of the probit to keep things on the same scale but make them more efficient.

Are you talking about `stan_polr`? It doesn’t seem to accept multivariate outcomes.

I was trying to extend the multivariate probit example in section 9.15 of the manual, but the number of levels is hardcoded there and I haven’t been able to figure out how to express the Albert & Chib trick for an arbitrary number of response levels, since I would either need a ragged array or a data-dependent number of parameters.

Oh, I was talking about in Stan itself, not in rstanarm or a higher-level interface.

It’s possible to hand-code ragged arrays—there’s a chapter in the manual. But it’s not elegant. They’re in the works, but we’re way backed up on our to-do list, so they’re not going to happen this year.

1 Like

Sorry to keep pestering, but still stuck on this one…

I checked out `ordered_logistic` but I can’t think of a way to introduce correlations there. As I understand it, working directly with a cumulative distribution link is equivalent to having marginalized out the latent variable.

Also, I realized that in order to use the Albert & Chib trick one must introduce bounded parameters. I didn’t want to waste more time trying to figure how to reconstruct the general structure so I just went and hardcoded it for a 4-level item (parameters+transformed parameters is where the “trick” happens and also where I wish I knew of a better way to execute the idea):

``````data {
int<lower=1> K;   // Num levels
int<lower=1> D;   // Num items
int<lower=0> N;   // Num obs

int<lower=0,upper=N> Ksizes[D,K];    // Responses for each of K levels from D items
}
parameters {
vector[D] mu;
vector[D] ci;
cholesky_factor_corr[D] L_Omega;
vector<lower=0>[D] L_sigma;

vector<upper=0>[Ksizes[1,1]] k11;
vector<lower=0,upper=ci[1]>[Ksizes[1,2]] k12;
vector<lower=ci[1],upper=4.5>[Ksizes[1,3]] k13;
vector<lower=4.5>[Ksizes[1,4]] k14;
vector<upper=0>[Ksizes[2,1]] k21;
vector<lower=0,upper=ci[2]>[Ksizes[2,2]] k22;
vector<lower=ci[2],upper=4.5>[Ksizes[2,3]] k23;
vector<lower=4.5>[Ksizes[2,4]] k24;
}
transformed parameters{
vector[D] z[N];
for(i in 1:Ksizes[1,1]){
z[i,1] = k11[i];
}
for(i in 1:Ksizes[1,2]){
z[i+Ksizes[1,1],1] = k12[i];
}
for(i in 1:Ksizes[1,3]){
z[i+Ksizes[1,1]+Ksizes[1,2],1] = k13[i];
}
for(i in 1:Ksizes[1,4]){
z[i+Ksizes[1,1]+Ksizes[1,2]+Ksizes[1,3],1] = k14[i];
}
for(i in 1:Ksizes[2,1]){
z[i,2] = k21[i];
}
for(i in 1:Ksizes[2,2]){
z[i+Ksizes[2,1],2] = k22[i];
}
for(i in 1:Ksizes[2,3]){
z[i+Ksizes[2,1]+Ksizes[2,2],2] = k23[i];
}
for(i in 1:Ksizes[2,4]){
z[i+Ksizes[2,1]+Ksizes[2,2]+Ksizes[2,3],2] = k24[i];
}
}
model {
matrix[D, D] L_Sigma;

ci ~ normal(2.5, 1);

L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
L_Omega ~ lkj_corr_cholesky(4);
L_sigma ~ cauchy(0, 2.5);

mu ~ normal(3,2);

z ~ multi_normal_cholesky(mu, L_Sigma);
}
generated quantities {
cov_matrix[D] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
}
``````

This is very ugly, but it actually works. Occasionally I get an error like:

``````[1] "Error in sampler\$call_sampler(args_list[[i]]) : "
[2] "  lub_constrain: lb is 0, but must be less than 1.00504"
[1] "error occurred during calling the sampler; sampling not done"
``````

Not a lot of info online about the error message but I assume it’s something like lub = lower/upper bounds and some value being initialized outside of the given limits. I’ve got fixed and appropriately placed initial values for all my parameters, so its origin remains a mystery.

I’d appreciate it if some light can be shed on the cause of the error or how to troubleshoot further as well as any suggestion to let the number and bounds of parameters vary according to the data so that hardcoding them isn’t necessary.

That is a confusing error message, but I’d draw the same conclusion as you. Assuming the function name at least is right, the issue is probably in a lower bound constraint. Are you initializing or letting Stan initialize? Do all the parameters have appropriate constraints so that any parameters satisfying the constraints have a finite log density?