Hi,

I’m trying to speed up the modified hierarchical multinomial logit model shown below. Is there anything I can do to optimize it such as vectorizing the nested for loops? Right now using rstan on my data, 100 iterations with 1 chain takes 340 seconds, and reducing the iterations would have a negative impact on the results. I am comparing the performance against a commercial product (Sawtooth software) which uses Gibbs sampling, and it takes less than a minute to run 20000 iterations. Surely I must be doing something suboptimal with stan that makes it much slower. Any help would be greatly appreciated!

```
data {
int<lower=2> C; // Number of alternatives (choices) in each scenario
int<lower=1> K; // Number of covariates of alternatives
int<lower=1> R; // Number of respondents
int<lower=1> S; // Number of scenarios per respondent
int<lower=0> G; // Number of respondent covariates
int<lower=1,upper=C> YB[R, S]; // best choices
int<lower=1,upper=C> YW[R, S]; // worst choices
matrix[C, K] X[R, S]; // matrix of attributes for each obs
matrix[G, R] Z; // vector of covariates for each respondent
}
parameters {
matrix[K, R] Beta;
matrix[K, G] Theta;
corr_matrix[K] Omega;
vector<lower=0>[K] tau;
}
transformed parameters {
cov_matrix[K] Sigma = quad_form_diag(Omega, tau);
}
model {
//priors
to_vector(Theta) ~ normal(0, 10);
tau ~ cauchy(0, 2.5);
Omega ~ lkj_corr(2);
//likelihood
for (r in 1:R) {
Beta[,r] ~ multi_normal(Theta*Z[,r], Sigma);
for (s in 1:S) {
YB[r,s] ~ categorical_logit(X[r,s] * Beta[,r]);
YW[r,s] ~ categorical_logit(-X[r,s] * Beta[,r]);
}
}
}
```

This code is modified from the example in https://github.com/ksvanhorn/ART-Forum-2017-Stan-Tutorial/blob/master/3_hmnl/hmnl.stan