Unable to recover latent covariance in multivariate multinomial model

EDIT: Just noticed I’m not using the covariances anywhere in the model… still, I haven’t figured out how to add that just yet, any suggestions on how to do that are welcome (currently trying to understand the multivariate probit regression example in the manual, feel this might be the right way)


I am building a model that will be used to analyze 13 Likert items that supposedly measure the same latent construct.

For now, I just have 2 simulated items. The observations are produced by taking samples from a multivariate normal and then cutting at arbitrary points. I’ve set a high correlation (0.9) between both items, and the simulated data reflects this:

     1  2  3  4
  1  4  3  0  0
  2  0  3  0  0
  3  0 15  4  3
  4  0  1  3 14

The model runs without issue* and it recovers most parameters (latent mean, position of cutpoints, variance) just fine, but the covariance remains at zero.

Here’s the model:

data {
  int<lower=0> N;        // # obs
  int<lower=2> K;        // # item options
  int<lower=0> D;        // # items
  int y[N, D];           // answers
}
parameters {
  ordered[K-3] ci[D];       // unknown cutpoints
  vector[D] mu;             // latent means
  cov_matrix[D] Sigma;      // latent covariance
}
transformed parameters{
  vector[K-1] c[D];
  for(d in 1:D){
    c[d, 1] = 1.5;
    c[d, K-1] = K-0.5;
    for(k in 1:(K-3)){
      c[d, 1+k] = ci[d, k];
    }
  }
}
model {
  matrix[N, K] p[D];      // item probabilities

  //priors  
  mu ~ normal(rep_vector(0.5*(K+1), D), K*0.5);
  for(d in 1:D){
    for (k in 1:(K-3)){
      ci[d,k] ~ normal(1.5 + k, 1);
    }
  }

  // calculate probabilities
  for(d in 1:D){
    for(n in 1:N){
      p[d,n,1] = Phi((c[d, 1] - mu[d])/diagonal(Sigma)[d]);
      p[d,n,K] = 1 - Phi((c[d, K-1] - mu[d])/diagonal(Sigma)[d]);
      for(k in 3:K){
        p[d,n,k-1] = Phi((c[d, k-1] - mu[d])/diagonal(Sigma)[d]) - Phi((c[d, k-2] - mu[d])/diagonal(Sigma)[d]);
      }
      y[n,d] ~ categorical(p[d,n]');
    }
  }
}

* I get divergent transitions when the sample size is small and the correlation low, not sure if that’s something to worry about just yet

You want to use Cholesky factors. We recommend scaling Cholesky factors of correlation matrices and describe that approach in the regression chapter of the manual.

That model should blow up with an unconstrained covariance matrix.

If you’re OK with logistic, we have a built-in ordinal logistic. We also have a Phi_approx function that uses a logistic approximation of the probit to keep things on the same scale but make them more efficient.

Are you talking about stan_polr? It doesn’t seem to accept multivariate outcomes.

I was trying to extend the multivariate probit example in section 9.15 of the manual, but the number of levels is hardcoded there and I haven’t been able to figure out how to express the Albert & Chib trick for an arbitrary number of response levels, since I would either need a ragged array or a data-dependent number of parameters.

Oh, I was talking about in Stan itself, not in rstanarm or a higher-level interface.

It’s possible to hand-code ragged arrays—there’s a chapter in the manual. But it’s not elegant. They’re in the works, but we’re way backed up on our to-do list, so they’re not going to happen this year.

1 Like

Sorry to keep pestering, but still stuck on this one…

I checked out ordered_logistic but I can’t think of a way to introduce correlations there. As I understand it, working directly with a cumulative distribution link is equivalent to having marginalized out the latent variable.

Also, I realized that in order to use the Albert & Chib trick one must introduce bounded parameters. I didn’t want to waste more time trying to figure how to reconstruct the general structure so I just went and hardcoded it for a 4-level item (parameters+transformed parameters is where the “trick” happens and also where I wish I knew of a better way to execute the idea):

data {
  int<lower=1> K;   // Num levels
  int<lower=1> D;   // Num items
  int<lower=0> N;   // Num obs
  
  int<lower=0,upper=N> Ksizes[D,K];    // Responses for each of K levels from D items
}
parameters {
  vector[D] mu;
  vector[D] ci;
  cholesky_factor_corr[D] L_Omega;
  vector<lower=0>[D] L_sigma;
  
  vector<upper=0>[Ksizes[1,1]] k11;
  vector<lower=0,upper=ci[1]>[Ksizes[1,2]] k12;
  vector<lower=ci[1],upper=4.5>[Ksizes[1,3]] k13;
  vector<lower=4.5>[Ksizes[1,4]] k14;
  vector<upper=0>[Ksizes[2,1]] k21;
  vector<lower=0,upper=ci[2]>[Ksizes[2,2]] k22;
  vector<lower=ci[2],upper=4.5>[Ksizes[2,3]] k23;
  vector<lower=4.5>[Ksizes[2,4]] k24;
}
transformed parameters{
  vector[D] z[N];
  for(i in 1:Ksizes[1,1]){
    z[i,1] = k11[i];
  }
  for(i in 1:Ksizes[1,2]){
    z[i+Ksizes[1,1],1] = k12[i];
  }
  for(i in 1:Ksizes[1,3]){
    z[i+Ksizes[1,1]+Ksizes[1,2],1] = k13[i];
  }
  for(i in 1:Ksizes[1,4]){
    z[i+Ksizes[1,1]+Ksizes[1,2]+Ksizes[1,3],1] = k14[i];
  }
  for(i in 1:Ksizes[2,1]){
    z[i,2] = k21[i];
  }
  for(i in 1:Ksizes[2,2]){
    z[i+Ksizes[2,1],2] = k22[i];
  }
  for(i in 1:Ksizes[2,3]){
    z[i+Ksizes[2,1]+Ksizes[2,2],2] = k23[i];
  }
  for(i in 1:Ksizes[2,4]){
    z[i+Ksizes[2,1]+Ksizes[2,2]+Ksizes[2,3],2] = k24[i];
  }
}
model {
  matrix[D, D] L_Sigma;
  
  ci ~ normal(2.5, 1);

  L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
  L_Omega ~ lkj_corr_cholesky(4);
  L_sigma ~ cauchy(0, 2.5);
  
  mu ~ normal(3,2);
  
  z ~ multi_normal_cholesky(mu, L_Sigma);
}
generated quantities {
  cov_matrix[D] Omega;
  Omega = multiply_lower_tri_self_transpose(L_Omega);
}

This is very ugly, but it actually works. Occasionally I get an error like:

[1] "Error in sampler$call_sampler(args_list[[i]]) : "         
[2] "  lub_constrain: lb is 0, but must be less than 1.00504"
[1] "error occurred during calling the sampler; sampling not done"

Not a lot of info online about the error message but I assume it’s something like lub = lower/upper bounds and some value being initialized outside of the given limits. I’ve got fixed and appropriately placed initial values for all my parameters, so its origin remains a mystery.

I’d appreciate it if some light can be shed on the cause of the error or how to troubleshoot further as well as any suggestion to let the number and bounds of parameters vary according to the data so that hardcoding them isn’t necessary.

That is a confusing error message, but I’d draw the same conclusion as you. Assuming the function name at least is right, the issue is probably in a lower bound constraint. Are you initializing or letting Stan initialize? Do all the parameters have appropriate constraints so that any parameters satisfying the constraints have a finite log density?