Improving speed of multilevel ordinal factor model

Hello,

I am working on a multilevel factor model, with 6 ordinal indicators with 4 categories (y1-6). The factor scores are predicted by an observed variable (x). Eventually, I would like to use this model in a simulation study but it is very slow. The main issue is that with the default settings, the maximum treedepth is exceeded for all transitions. If I increase the max. treedepth to 15 (I could start with 11 first and move up until I don’t get any warnings, but in a simulation study I would prefer to have a conservative setting to avoid having to rerun many replications), the model takes almost 4,5 hours to run for 233 observations in 10 groups. This would be one of the smaller samples considered in the simulation.
I think the main issue is due to the ordered_logistic part of the model; if I dichotomise the data and use bernoulli_logit, the model runs almost 50 times faster. Still, this is my first time working with ordinal data in Stan, so there might be some obvious tricks I missed to speed up the model. It might be important to note that the number of observations in each group can differ, which is why I use the group indicator iig to select the correct group corresponding to each observation.

Thanks in advance for any suggestions or ideas!

data {
  int<lower = 0> N; // number of observations
  int<lower = 0> J; // number of items
  int<lower = 0> G; // number of groups
  int<lower = 1, upper = G> iig[N]; // group indicator
  // ordinal item responses
  int<lower = 2> K; // number of categories
  int<lower = 0, upper = K> y1[N]; 
  int<lower = 0, upper = K> y2[N]; 
  int<lower = 0, upper = K> y3[N]; 
  int<lower = 0, upper = K> y4[N]; 
  int<lower = 0, upper = K> y5[N]; 
  int<lower = 0, upper = K> y6[N];
  vector[N] x; 
  vector[G] xmean; // group means predictor
}
parameters {
  vector<lower = 0>[J-1] lambda_free; 
  vector[J-1] nu_free;
  // thresholds
  ordered[K-1] c1; 
  ordered[K-1] c2; 
  ordered[K-1] c3; 
  ordered[K-1] c4; 
  ordered[K-1] c5; 
  ordered[K-1] c6; 
  // measurement model between
  vector[G] etaB_raw;
  real<lower = 0> omegaB;
  matrix[G, J] mu_raw;
  // measurement model within
  matrix[N, G] etaW_raw;
  real<lower = 0> omegaW;
  // structural model
  real betaW;
  real alpha;
  real betaB;
  real<lower = 0> sigmaYB[J];
}
transformed parameters {
  vector[J] lambda;
  vector[J] nu;
  vector[G] etaB;
  matrix[N, G] etaW;
  matrix[G, J] mu;
  vector[J] icc;
  // identification constraints
  lambda[1] = 1;
  lambda[2:J] = lambda_free;
  nu[1] = 0;
  nu[2:J] = nu_free;
  // non-centered parametrization
  etaB = (alpha + betaB * xmean) + omegaB * etaB_raw;
  for(g in 1:G){
    etaW[, g] = (betaW * x) + omegaW * etaW_raw[, g];
  }
  for(j in 1:J){
    mu[, j] = (nu[j] + lambda[j] * etaB) + sigmaYB[j] * mu_raw[, j];
    // compute icc
    icc[j] = (lambda[j]^2 * omegaB + sigmaYB[j]) / ( (lambda[j]^2 * omegaB + sigmaYB[j]) + (lambda[j]^2 * omegaW + 1) );
  }
}
model {
  // priors standard deviations are half-normal(0, 1)
  omegaB ~ normal(0, 1);
  omegaW ~ normal(0, 1);
  sigmaYB ~ normal(0, 1);
  // priors nuisance parameters
  lambda_free ~ normal(0, 10);
  nu_free ~ normal(0, 100);
  betaW ~ normal(0, 10);
  alpha ~ normal(0, 100);
  betaB ~ normal(0, 10);
  // uniform priors thresholds
  // likelihood
  etaB_raw ~ std_normal(); // implies etaB ~ N(alpha + betaB * xmean, omegaB)
  for(g in 1:G){
    mu_raw[g, ] ~ std_normal(); // implies mu ~ N(nu[j] + lambda[j] * etaB, sigmaYB[j])
    etaW_raw[, g] ~ std_normal(); // implies etaW ~ N(betaW * x, omegaW)  
  }
    for(i in 1:N){
      y1[i] ~ ordered_logistic(mu[iig[i], 1] + lambda[1] * etaW[i, iig[i]], c1);
      y2[i] ~ ordered_logistic(mu[iig[i], 2] + lambda[2] * etaW[i, iig[i]], c2);
      y3[i] ~ ordered_logistic(mu[iig[i], 3] + lambda[3] * etaW[i, iig[i]], c3);
      y4[i] ~ ordered_logistic(mu[iig[i], 4] + lambda[4] * etaW[i, iig[i]], c4);
      y5[i] ~ ordered_logistic(mu[iig[i], 5] + lambda[5] * etaW[i, iig[i]], c5);
      y6[i] ~ ordered_logistic(mu[iig[i], 6] + lambda[6] * etaW[i, iig[i]], c6);
    }
}

You should get some speedup (but no change in hitting the maximum treedepth) by calling ordered_logistic six times instead of 6N times, like

y1 ~ ordered_logistic(mu[iig, 1] + lambda[1] * etaW[, iig], c1);

and similarly for the other outcomes. Similarly, it is going to be a bit faster to do

to_vector(mu_raw) ~ std_normal(); // implies ...
to_vector(etaW_raw) ~ std_normal(); // implies ...

You can ignore any parser warnings about Jacobians for those two lines because to_vector is an identity transformation.

Thank you for the quick reply!

Unfortunately, the model will no longer parse when I use this line:

y1 ~ ordered_logistic(mu[iig, 1] + lambda[1] * etaW[, iig], c1);

It gives me the error that there are no matches for vector + matrix, because mu[iig, 1] is a vector and etaW[, iig] a matrix.

Ah, you may need something more like

matrix[N, J] utility = mu[iig, ] +  etaW[ , iig] * lambda;
y1 ~ ordered_logisitic(utility[,i], c1);
...

whatever makes the matrix dimensions come out right.

Of course, I hadn’t thought of that.

I will try this out; thank you!

You could also try using the development version of Stan (or StanHeaders if you’re using RStan). The ordered_logistic distribution now has analytic gradients, and has provided some nice speed-ups for my own models

Thank you for this suggestion!
Just to be sure: what do you mean by using the development version of StanHeaders when using RStan? Can’t I just install the development version of RStan as follows:
remotes::install_github("stan-dev/rstan", ref = "develop", subdir = "rstan/rstan", build_opts = "")

That doesn’t bring in the development version of StanHeaders. Actually, I am not sure if everything will build at the moment. It might be easier to just download
https://raw.githubusercontent.com/stan-dev/math/develop/stan/math/prim/mat/prob/ordered_logistic_lpmf.hpp
and overwrite whatever file

system.file("include", "stan", "math", "prim", "mat", "prob", 
            "ordered_logistic_lpmf.hpp", 
            package = "StanHeaders")

returns.

That worked!
I discovered some errors in my original model as well. Fixing those, together with using the development version has made a tremendous difference, since I no longer need the adapted treedepth (this is most likely due to my original model not yet being fully identified). I’m posting the final model here, just for anyone encountering this post and being interested:

data {
  int<lower = 0> N; // number of observations
  int<lower = 0> J; // number of items
  int<lower = 0> G; // number of groups
  int<lower = 1, upper = G> iig[N]; // group indicator
  // ordinal item responses
  int<lower = 2> K; // number of categories
  int<lower = 0, upper = K> y1[N]; 
  int<lower = 0, upper = K> y2[N]; 
  int<lower = 0, upper = K> y3[N]; 
  int<lower = 0, upper = K> y4[N]; 
  int<lower = 0, upper = K> y5[N]; 
  int<lower = 0, upper = K> y6[N];
  vector[N] x; 
  vector[G] xmean; // group means predictor
}
parameters {
  vector<lower = 0>[J-1] lambda_free; 
  // thresholds
  ordered[K-1] c1; 
  ordered[K-1] c2; 
  ordered[K-1] c3; 
  ordered[K-1] c4; 
  ordered[K-1] c5; 
  ordered[K-1] c6; 
  // measurement model between
  vector[G] etaB_raw;
  real<lower = 0> omegaB;
  matrix[G, J] mu_raw;
  // measurement model within
  vector[N] etaW_raw;
  real<lower = 0> omegaW;
  // structural model
  real betaW;
  real betaB;
  real<lower = 0> sigmaYB[J];
}
transformed parameters {
  vector[J] lambda;
  vector[G] etaB;
  vector[N] etaW;
  matrix[G, J] mu;
  // identification constraints
  lambda[1] = 1;
  lambda[2:J] = lambda_free;
  // non-centered parametrization
  etaB = (betaB * xmean) + omegaB * etaB_raw;
  etaW = (betaW * x) + omegaW * etaW_raw;
  for(j in 1:J){
    mu[, j] = (lambda[j] * etaB) + sigmaYB[j] * mu_raw[, j];
  }
}
model {
  // priors standard deviations are half-normal(0, 1)
  omegaB ~ normal(0, 1);
  omegaW ~ normal(0, 1);
  sigmaYB ~ normal(0, 1);
  // priors nuisance parameters
  lambda_free ~ normal(0, 10);
  betaW ~ normal(0, 10);
  betaB ~ normal(0, 10);
  // uniform priors thresholds
  // likelihood
  etaB_raw ~ std_normal(); // implies etaB ~ N(betaB * xmean, omegaB)
  to_vector(mu_raw) ~ std_normal(); // implies mu ~ N(lambda[j] * etaB, sigmaYB[j]) 
  etaW_raw ~ std_normal(); // implies etaW ~ N(betaW * x, omegaW) 
    for(i in 1:N){
      y1[i] ~ ordered_logistic(mu[iig[i], 1] + lambda[1] * etaW[i], c1);
      y2[i] ~ ordered_logistic(mu[iig[i], 2] + lambda[2] * etaW[i], c2);
      y3[i] ~ ordered_logistic(mu[iig[i], 3] + lambda[3] * etaW[i], c3);
      y4[i] ~ ordered_logistic(mu[iig[i], 4] + lambda[4] * etaW[i], c4);
      y5[i] ~ ordered_logistic(mu[iig[i], 5] + lambda[5] * etaW[i], c5);
      y6[i] ~ ordered_logistic(mu[iig[i], 6] + lambda[6] * etaW[i], c6);
    }
}

Unfortunately, it was not possible to get rid of the final for loop calling the ordered_logistic. Changing that to:

y1 ~ ordered_logistic(mu[iig, 1] + lambda[1] * etaW, c1);

for every item resulted in a model that would not converge at all.
Still, the model is already much faster now, so thank you!

2 Likes