I'm a new convert to Stan. My model runs extremely slowly. Advice appreciated!

I set up an item response theory model based on this post. That model worked well, but I then needed to expand it to estimate a number of latent factors each based on a different set of items. I also needed an “overall” latent factor that some of the other latent factors were functions of. The model still works, but is extremely slow - hours to fit on my relatively modern hardware with just hundreds of rows of data.

I would like to vectorize the code, but I can’t have vectors or matrices of ordered elements, so I don’t think it’s possible. Happy to learn I’m wrong of course :) Code is below. Any advise at all is appreciated. I can’t share the data due to privacy policy, but if it’s really helpful, I’m happy to make some synthetic data.

Any help is appreciated very much!


data {
  // numbers of things
  int<lower=1> I;  // items
  int<lower=1> S;  // subjects
  int<lower=1> R;  // responses
  int<lower=1> G;  // grades
  // training data
  int<lower=1,upper=I> item[R];
  int<lower=1,upper=S> subject[R];
  int<lower=1,upper=G> grade[R];
  vector[8] factor_map[R]; // indicates which latent factor determines each response
}
parameters {
  // parameters
  ordered[G-1] difficulty_communication[I];
  ordered[G-1] difficulty_dei[I];
  ordered[G-1] difficulty_leadership_and_admin[I];
  ordered[G-1] difficulty_people_experience[I];
  ordered[G-1] difficulty_performance_and_productivity[I];
  ordered[G-1] difficulty_productivity[I];
  ordered[G-1] difficulty_satisfaction[I];
  ordered[G-1] difficulty_wellness[I];
  vector[S] ability_communication;
  vector[S] ability_dei;
  vector[S] ability_leadership_and_admin;
  vector[S] ability_people_experience;
  vector[S] ability_performance_and_productivity;
  vector[S] ability_productivity;
  vector[S] ability_satisfaction;
  vector[S] ability_wellness;
  vector[S] ability_overall;
  real<lower=0> beta_communication;
  real<lower=0> beta_dei;
  real<lower=0> beta_leadership_and_admin;
  real<lower=0> beta_people_experience;
  real<lower=0> beta_performance_and_productivity;
  // hyperparameters
  real mu_first_difficulty_communication;
  real mu_first_difficulty_dei;
  real mu_first_difficulty_leadership_and_admin;
  real mu_first_difficulty_people_experience;
  real mu_first_difficulty_performance_and_productivity;
  real mu_first_difficulty_productivity;
  real mu_first_difficulty_satisfaction;
  real mu_first_difficulty_wellness;
  real<lower=0> sigma_ability_overall;
  real<lower=0> sigma_first_difficulty_communication;
  real<lower=0> sigma_first_difficulty_dei;
  real<lower=0> sigma_first_difficulty_leadership_and_admin;
  real<lower=0> sigma_first_difficulty_people_experience;
  real<lower=0> sigma_first_difficulty_performance_and_productivity;
  real<lower=0> sigma_first_difficulty_productivity;
  real<lower=0> sigma_first_difficulty_satisfaction;
  real<lower=0> sigma_first_difficulty_wellness;
  real<lower=0> sigma_step;
  real<lower=0> sigma_beta_communication;
  real<lower=0> sigma_beta_dei;
  real<lower=0> sigma_beta_leadership_and_admin;
  real<lower=0> sigma_beta_people_experience;
  real<lower=0> sigma_beta_performance_and_productivity;
}
model {
  // priors
  ability_overall ~ normal(0, sigma_ability_overall);
  beta_communication ~ normal(0, sigma_beta_communication);
  beta_dei ~ normal(0, sigma_beta_dei);
  beta_leadership_and_admin ~ normal(0, sigma_beta_leadership_and_admin);
  beta_people_experience ~ normal(0, sigma_beta_people_experience);
  beta_performance_and_productivity ~ normal(0, sigma_beta_performance_and_productivity);
  difficulty_communication[1:I, 1] ~ normal(mu_first_difficulty_communication, sigma_first_difficulty_communication);
  difficulty_dei[1:I, 1] ~ normal(mu_first_difficulty_dei, sigma_first_difficulty_dei);
  difficulty_leadership_and_admin[1:I, 1] ~ normal(mu_first_difficulty_leadership_and_admin, sigma_first_difficulty_leadership_and_admin);
  difficulty_people_experience[1:I, 1] ~ normal(mu_first_difficulty_people_experience, sigma_first_difficulty_people_experience);
  difficulty_performance_and_productivity[1:I, 1] ~ normal(mu_first_difficulty_performance_and_productivity, sigma_first_difficulty_performance_and_productivity);
  difficulty_productivity[1:I, 1] ~ normal(mu_first_difficulty_productivity, sigma_first_difficulty_productivity);
  difficulty_satisfaction[1:I, 1] ~ normal(mu_first_difficulty_satisfaction, sigma_first_difficulty_satisfaction);
  difficulty_wellness[1:I, 1] ~ normal(mu_first_difficulty_wellness, sigma_first_difficulty_wellness);
  for (i in 1:I){
    difficulty_communication[i, 2:G-1] - difficulty_communication[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_dei[i, 2:G-1] - difficulty_dei[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_leadership_and_admin[i, 2:G-1] - difficulty_leadership_and_admin[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_people_experience[i, 2:G-1] - difficulty_people_experience[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_performance_and_productivity[i, 2:G-1] - difficulty_performance_and_productivity[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_productivity[i, 2:G-1] - difficulty_productivity[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_satisfaction[i, 2:G-1] - difficulty_satisfaction[i, 1:G-2] ~ normal(0, sigma_step);
    difficulty_wellness[i, 2:G-1] - difficulty_wellness[i, 1:G-2] ~ normal(0, sigma_step);
  }
  // data model
  for (response in 1:R){
    grade[response] ~ ordered_logistic(ability_communication[subject[response]] * factor_map[response, 1] +
                                       ability_dei[subject[response]] * factor_map[response, 2] +
                                       ability_leadership_and_admin[subject[response]] * factor_map[response, 3] +
                                       ability_people_experience[subject[response]] * factor_map[response, 4] +
                                       ability_performance_and_productivity[subject[response]] * factor_map[response, 5] +
                                       ability_productivity[subject[response]] * factor_map[response, 6] +
                                       ability_satisfaction[subject[response]] * factor_map[response, 7] +
                                       ability_wellness[subject[response]] * factor_map[response, 8],
                                       difficulty_communication[item[response]] * factor_map[response, 1] +
                                       difficulty_dei[item[response]] * factor_map[response, 2] +
                                       difficulty_leadership_and_admin[item[response]] * factor_map[response, 3] +
                                       difficulty_people_experience[item[response]] * factor_map[response, 4] +
                                       difficulty_performance_and_productivity[item[response]] * factor_map[response, 5] +
                                       difficulty_productivity[item[response]] * factor_map[response, 6] +
                                       difficulty_satisfaction[item[response]] * factor_map[response, 7] +
                                       difficulty_wellness[item[response]] * factor_map[response, 8]);
    
  }
  ability_communication ~ normal(beta_communication * ability_overall, 1);
  ability_dei ~ normal(beta_dei * ability_overall, 1);
  ability_leadership_and_admin ~ normal(beta_leadership_and_admin * ability_overall, 1);
  ability_people_experience ~ normal(beta_people_experience * ability_overall, 1);
  ability_performance_and_productivity ~ normal(beta_performance_and_productivity * ability_overall, 1);
1 Like

Yikes, that’s a lot of compute!

I’m guessing you have multiple responses associated with each subject, and multiple responses associated with each item. In that case you should be able to pre-compute two sets of dot products (one for subjects, one for items) then index into these for each response to get the terms you need to sum for that response. And you should be able to use columns_dot_product to do all the products at once, summing at the end.

Wait, is factor_map just filled with 1s and zeroes with mostly zeroes? i.e. is it being used to detemine, for example, which ability-type should contribute for each response? If that;s the case, then you shouldn’t be using it as a multiplier, but as an indexer into an ability variable that’s a matrix:

data{
   ...
   int<lower=1,upper=9> which_ability[R];
   ...
parameters{
   ...
   matrix[S,9] ability ;
   ...
}
model{
   ...
   grade[response] ~ ordered_logistic( ability[subject[response],which_ability[response]] ... ) ;
}

With the same treatment of difficulty/items.

And with that structure, you can vectorize the likelihood:

   grade ~ ordered_logistic( ability[subject,which_ability] ... ) ;

While priors rarely induce much in the way of compute bottlenecks, yours seem rather arcane and I suspect you might find similar vectorization avenues there too.

3 Likes

Oh, and note that you’re missing priors on your mus. And your loop over items to set the 2:G items (You have G-1, but that leaves the last item without a prior) can probably be vectorized by broadcasting the 1st item to be G-1 long.

1 Like

Thanks for all the help! It’s improved now and runs in more like an hour. I think there may still be headroom though.

I couldn’t quite vectorize the likelihoods - I think beecause difficulty needs to be an array rather than a matrix, as I believe I can’t make a matrix of ordered[] types. This is the best I could do:

data {
  // numbers of things
  int<lower=1> I;  // items
  int<lower=1> S;  // subjects
  int<lower=1> R;  // responses
  int<lower=1> G;  // grades

  // training data
  int<lower=1,upper=I> item[R];
  int<lower=1,upper=S> subject[R];
  int<lower=1,upper=G> grade[R];
  int<lower=1,upper=8> which_ability[R];
}
parameters {
  // parameters
  ordered[G-1] difficulty[I];
  matrix[S,8] ability;
  vector[S] ability_overall;
  vector<lower=0>[5] beta;

  // hyperparameters
  real mu_first_difficulty;
  real<lower=0> sigma_ability_overall;
  real<lower=0> sigma_first_difficulty;
  real<lower=0> sigma_step;
  real<lower=0> sigma_beta[5];
}
model {
  // priors
  ability_overall ~ normal(0, sigma_ability_overall);
  beta ~ normal(0, sigma_beta);
  difficulty[1:I, 1] ~ normal(mu_first_difficulty, sigma_first_difficulty);
  for (i in 1:I) { 
      difficulty[i, 2:G-1] - difficulty[i, 1:G-2] ~ normal(0, sigma_step);
  }
  // I think the loop above is necessary as difficulty needs to be an array
  // as I can't have a matrix of ordered[] and it seems I can't do elementwise
  // subtraction between two arrays without a loop.
  for (response in 1:R){
    grade[response] ~ ordered_logistic(ability[subject[response],which_ability[response]], difficulty[item[response]]) ;
  }
  // I think I can't vectorize the above due to difficulty being an array?
  for (a in 1:5) {
    ability[1:S, a] ~ normal(beta[a] * ability_overall, 1);
  }
  // I'd like to replace the above by saying something like ability ~ normal(beta * ability_overall, 1)
  // where the betas specify the means for each column of the ability matrix and within a column the rows are IID
  /// but I'm not sure how.
}

Also, which mus am I missing priors on? And if you can clarify what you mean by my priors being arcane, I’m happy to work on improving them.

Also, I should mention this model now assumes that all the first difficulties are chosen from the same distribution - previously they each had their own mu and sigma. I thought this was probably acceptable for my purposes.