Simple Multinomial Logistic Regression Performance


I’m new to Stan and I’m trying to fit a simple multinomial logistic model using rstan and I want to know if I have coded the model efficiently and/or that the performance I’m seeing is normal. I’m using a Windows machine and the Stan version is 2.21.0.

I’ve looked around the forums for a bit and I came across this post, and my problem is kind of similar in that the performance that I’m seeing is not that great and I’m not sure if there is something wrong with the way I coded the model up. I’ve also enabled parallel cores using:

options(mc.cores = parallel::detectCores())

and configured the C++ toolchain as suggested in the getting started page of RStan.

The data I’m trying to fit is fairly small, it’s a conjoint survey data of about ~340 units each doing 16 tasks of 5 choices each task. The choices include an outside option of all zeros and there are a total of 10 number of variables.

The data is organized as such that the rows corresponds to: the number of units x the number of tasks each unit does x the number of choices (n * t * p) and I’ve collapsed the number of units x the number of tasks to (n * t = N), and so the dimension for x is (N x na) where na is the number of variables.

The stan model code for it is given below:

model_code = "
data {
  int p; //number of choice alternatives
  int na; //number of alternative-specific vars
  int N; //data length
  int y[N]; //n x 1 multinomial outcomes
  matrix[N * p, na] x; //dataset
parameters {
  vector[na] beta;
model {
  vector[N * p] x_beta = x * beta;

  matrix[N, p] x_beta2;
  x_beta2 = to_matrix(x_beta, N, p, 0); //convert to matrix N x p
  beta ~ normal(0, 100); //specify prior
  for (n in 1:N)
    y[n] ~ categorical_logit(x_beta2[n]');

From what I’ve seen from the documentation, the categorical_logit has not been vectorized and I have to resort to the use of loops. I ran the model to draw 20,000 draws of the beta parameters and, I tried running the model a couple of times and I’m seeing that it takes me, on average, ~30 minutes to complete 20,000 draws and currently I’m just using 1 chain. I’ve tried using a more reasonable prior for logistic regression \mathcal{N}(0, 1), but I’m seeing similar performance (just very slightly better) after doing so.


Any reason you’re trying to obtain so many samples? The defaults of 1000 post-warmup samples are sufficient for inference even in the tails of the distribution, especially if there are multiple chains. Are you not getting a good effective sample size at the default?

Also, folks have reported recently that best performance on Windows is achieved by using WSL.

There’s not really a reason for the number of samples I’m trying to obtain. Since, I’m new to Stan, I just want to gauge how fast Stan is for producing samples and/or if I’m doing something ineffective in my model code.

Ok, then you should definitely stick with the default of 1000 post-warmup samples and only explore more if you are getting low effective sample sizes. Look into using WSL, and if your system has 2*N or more CPU cores (where N is the number of chains desired, usually 4), take a look at reduce_sum() for within-chain parallelization. Oh, and certainly don’t use that normal(0, 100) prior.

Now, it does look like you’re doing a bunch of transformations (casting to matrix, transpose) that might be affecting computation speed, but I get out of my depth very quickly in that realm. @Bob_Carpenter could you take a peek at the OP and comment on whether those transformations are going to be very costly?