Hierarchical multi-logit regression

For these two lines:

vector[K * (D - 1)] mu_vec = to_vector(mu);
vector[K * (D - 1)] sigma_vec = to_vector(sigma);

Should I put them in the model{} block? Or should I put them in the transformed parameters{} block?

Thanks again.


Thanks for all the support so far. My model is running and I am just tuning my hyperpriors. However, now I have another programming-related question.

parameters {
matrix[K, D - 1] mu;
matrix<lower=0>[K, D - 1] sigma;
matrix[K, D - 1] beta[L];
in order to align beta[i] with mu and sigma.

In my setup, beta is a K by (D-1) matrix of parameters. Now, suppose I want to restrict some of the elements of beta to be 0. The easiest thing to do this in my mind is to define another K by (D-1) matrix that consists of only 1’s and 0’s. I can even supply this binary matrix as data input. Then I can do an element-by-element matrix multiplication between beta and this user-supplied matrix in the transformed parameters block. But is this doable in Stan? If not, do you have any work-around to recommend? Thanks again!

That is not a good way to do it in Stan as you won’t be able to identify the multiplied parameters that get forgotten. Instead, read the missing data chapter of the manual. You need a parameter vector the size of the number of non-fixed values, then you need to copy those into a matrix and fill in the known values. Then you won’t have redundant and unconstrained parameters.

Thanks. I understand that I do not want to have redundant parameters.

But can STAN do element-by-element multiplication between two matrices of the same size? I am just trying to figure out programmatically how to fill in the desired matrix with known and unknown values. But I also prepare that I may have to rethink my whole setup, too…

Yes, it is the .* operator.

And even if it couldn’t, it’s a simple loop. Don’t be afraid of loops in Stan—especially when there aren’t any densities inside. It’s not slow to loop like an interpreted language such as R or Python or JAGS or BUGS.

It’s good to know about the loop. I will give it a shot!

I have another clarifying question.

parameters {
matrix[K,D] beta;
model {
for (k in 1:K)
beta[k] ~ normal(0, 5);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));

This is from the manual. It is understood to have omitted the transformed parameters block for the zero coefficients.

But my question is hopefully silly. How does the categorical function (and also categorical_logit) know that the Pr(y=1) is aligned with beta[1]? There is no a priori sorting done on the data of y, so y[1], the first observation can be anything from 1,…,K.

Thanks again.

Yes, y[1] can be any of 1:K—I don’t understand why you think that’s a problem. beta[1] is a D-vector representing the coefficients for the outcome 1 by definition. You can look up the definition of categorical_lpmf in the manual. beta * x[n] is just multiplying a matrix beta by a vector x[n].

Dear Stan Board,

I have a follow up to this discussion about left hand side sampling statements when working with multidimensional arrays. I have a multivariate hierarchical distribution across seasons (S). For each season I want to have a hierarchical distribution with seasonal means, but a common covariance matrix. Where I am getting stuck is how to make sure for each season S, the parameter vector theta is drawn from a multivariate normal distribution. I have been trying the to_vector command, but without much success. Ignoring the seasonal piece, the following works fine where theta is declared as a vector[K] theta[G], and mu_theta as a vector [K], e.g.,

theta ~ multi_normal_cholesky(mu_theta, diag_pre_multiply(sigma, Lcorr));

Now, however, with the seasons, I have theta declared as matrix[G, K] theta [S]. I have played around with the to_vector command suggested above, but haven’t found the right way to make it work. Below is the relevant code for my model. Any suggestions would be much appreciated.

cholesky_factor_corr[K] Lcorr;
  vector<lower=0> [K] tau; 
  matrix[G, K] theta [S]; // Parameters for Each Year.  
  vector<lower=0> [K] mu_theta[S]; // Means for Parameters by Season

model {
  tau ~ student_t(4, 0, 1); //Prior on Standard Deviations
  Lcorr ~ lkj_corr_cholesky(1);
  //prior for hierarchical means
  for(i in 1:S){
  mu_theta[i,1] ~ normal(.2, 0.1);  
  mu_theta[i,2] ~ normal(.5, 0.2);
  mu_theta[i,3] ~ normal(225, 15);
  mu_theta[i,4] ~ lognormal(3, 0.5);
  mu_theta[i,5] ~ lognormal(3, 0.5);

  for(j in 1:K){
  for(i in 1:S){
  theta[i,,j] ~ multi_normal_cholesky(mu_theta[i], diag_pre_multiply(tau, Lcorr));

I figured out the issue and below is the solution in case anyone else runs into the same problem with indexing over a 3d array and using the multivariate normal density.

  for(i in 1:S){
  for(j in 1:G){
  row(theta[i], j) ~ multi_normal_cholesky(mu_theta[i], diag_pre_multiply(tau, Lcorr));

I’m not sure the type of theta here, but you can probably just use theta[i, j] ~ ....

Hi Bob,

Thanks for the note. Theta is defined as matrix[G, K] theta [S]. I was thinking the cost of loops was low in Stan given it compiles in C++, but will give your suggestion a shot.


That was just a suggestion to replace row(theta[i], j) with theta[i, j] — they’ll resolve to the same thing.

But now that I see what you’re doing, I can suggest what you really want to do. Let K = size(tau), and code this as:

  vector[K] theta_flat[S * G];
  int i = 1;
  for (s in 1:S) {
    for (g in 1:G) {
      theta_flat[i] = theta[s, g] - mu_theta[s];
      i += 1;
  theta_flat ~ multi_normal_cholesky(rep_vector(0, K), diag_pre_mutiply(tau, Lcorr));

Loops are fast, but having to continually diag-pre-multiply and factor isn’t. This makes sure you only factor the matrix once. This should be a lot faster than what you were doing.

Excellent. Thanks a bunch for the suggestion. The model was taking about 18 hours to run before, so the speed increase will be a big help.


Hi All,

I am a new stan user working with a multinomial regression. I encountered the same K vs. K-1 issue. Initially I was using the softmax to normalize my probabilities, however following the advice here and from an older discussion (https://github.com/stan-dev/stan/issues/266) I switched to categorical_logit.

When I run the model I see that it does not contain any samples, yet it does not throw an errors so I’m not sure how to diagnose the issue. Perhaps I am misunderstanding how categorical_logit is implemented in stan?

int<lower=0> P; // number of subjects
int<lower=0> T; // number of dates
int<lower=0> N; // number of observations

int<lower=0> y[N, 3]; // vector of multinomial counts
int<lower=0> n[N]; // vector of samples
int<lower=0> t[N]; // vector of date indices
int<lower=0> id[N]; // vector of subject indices

matrix[T, 2] theta; // parameter associated with each date
matrix[P, 2] epsilon; // bias parameter associated with each subject

for(ii in 1:N) y[ii,] ~ categorical_logit( c(theta[t[ii],]+epsilon[id[ii],], 0) );

I think you want multinomial, not categorical, if I understand your comments. The difference in what they expect for y is explained in the manual (categorical wants a single outcome, multinomial a vector of counts).

Yes that seems to do the trick. Thanks for the help!


Has anyone tried to code a hierarchical multi-logit regression??? I am in marketing and am doing Conjoint Analysis. I was given some legacy WinBugs codes by my employer, but I am trying to switch to STAN.

Here are some links that you might find helpful:



If you’re looking for a straightforward multilevel, multinomial model, this paper might be useful:

Note that it doesn’t tackle the conditional logits (or mixed logits).

1 Like