Troubleshooting a "MaxDiff" hierarchical multinomial regression

This question is related to the question here: Speeding up a hierarchical multinomial logit model

I am using the Stan code from that post to run a hierarchical multinomial logistic regression to analyze “MaxDiff” data (also known as “object case best-worst scaling,” see reference here).

The post above says it was taking 340 seconds to run 100 iterations, but for my data (350 respondents, 13 items displayed across 13 blocks with 4 alternatives per block) it is taking multiple hours to run the default 2000 iterations, and then I get lots of problems that suggest to change adapt delta, max tree depth, and up the number of iterations.

Something tells me that I am likely doing something wrong with either the model (but it is essentially entirely copy-pasted over from the post above) or preparing the data in R and feeding it to rstan.

I have attached the data as a .csv below, along with the .stan file and the .R code that prepares the data and feeds it to Stan via rstan.

Could someone here help me figure out how to make this model run in a number of minutes, instead of a number of hours? It doesn’t have to be blazing fast, it just has to be usable. (And the slowness and various warning messages makes me think something is specified incorrectly.)

Thank you!

mnl_q.csv (302.5 KB) mnl_q.R (1.9 KB) mnl_q.stan (1.9 KB)

I did not run the code; 340 seconds doesn’t look very attractive :-).

However, I had a look at the code and I have some thoughts that might help you looking for the issue. First of all, the code uses the Cholesky parameterisation of the multi_normal with L_Omega as the Cholesky factor of the correlation matrix of the individual level effects and L_Sigma as the Cholesky factor of the covariance matrix. I saw that was not entirely clear from the comments. The Stan user guide has some good explanations of this parameterisation. This code also uses the centered parameterisation for the individual level effects Beta, you could look into the non-centered parameterisation in the Stan user guide.

On to the code, the divergences (adapt delta) and the treedepth warnings indicate that your model is not a good fit for the data. There is multiple reasons why that might be the case. One thing could be that the model imposes quite a strong restriction in that whatever variable X that determines what an individual things is the best option will have the same but exact opposite effect for the worst option. Is that a reasonable assumption for your data? If not it might be worthwile to see whether you can make the model work with the YB[r,s] ~ ... or YW[r,s] ~ ... line commented out.

The code uses quite broad priors the implicit prior on L_sigma is cauchy(0, 2.5) and the Theta_raw priors are also quite wide for logit specification. Do these make sense for your data? Simulations from the prior might help with figuring this out.

I ran your code with 100 iterations and it took me 13.5 minutes which seems a bit slow but not too unusual. I can see you using the default of 2000 iterations and taking hours to run. I did get quite a few warning messages which may be due to insufficient iterations. I would suggest you try out Displayr ( which also allows you to run a “latent class analysis” on your data for comparison, and also allows you to visualise the solution to diagnose issues with your data.

I realize this post is old at this point, but did you end up resolving this issue? I’m an R user that is trying to get acquainted with rstan as well.

1 Like

I did not, but I developed an R package using other methods for calculating individual best-worst scaling scores in balanced incomplete block designs. See the page for it here: You’ll see that one of the issues, #10, references this post and is welcoming help to anyone that can speed it up and wants to contribute to the package.

I have found that analytical estimation of an empirical Bayes model, proposed by Lipovetsky & Conklin (2015), works quite well. See

Lipovetsky, S., & Conklin, M. (2015). MaxDiff priority estimations with and without HB-MNL. Advances in Adaptive Data Analysis, 7(1). doi: 10.1142/S1793536915500028