Interpretation of multinomial regression coefficients

Hi all,

Thanks to those who assisted me in estimating a multinomial logit model. I am familiar with the interpretation of these coefficients in the conventional fashion in which a category of the outcome is left out. For this analysis, I unclear what the regression coefficients are conveying. The model is very simple, with a 4 category outcome and a dichotomous predictor (male/female). Here is the code

modelString <- "

data {
  int <lower=2> K;   // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;  // This is the number of columns in the design                                     matrix: 2
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;   // This will be N by 4 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 2 x 4 matrix of betas
}

transformed parameters {
  matrix[N, K] x_beta = x * beta;   //  N x 2  * 2  x 4 
}

model {
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}

generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
  ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}
}

"

and here is the output.

Inference for Stan model: 439376e657c9f05511f724d396ba31c0.
1 chains, each with iter=10000; warmup=5000; thin=10; 
post-warmup draws per chain=500, total post-warmup draws=500.

           mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
beta[1,1]  0.20    0.11 2.51 -4.21 -1.58  0.16 1.85  5.32   497    1
beta[1,2]  0.06    0.11 2.51 -4.32 -1.74  0.03 1.72  5.14   500    1
beta[1,3] -0.28    0.11 2.52 -4.79 -2.10 -0.31 1.37  4.99   501    1
beta[1,4] -0.27    0.11 2.51 -4.59 -2.08 -0.31 1.44  4.98   496    1
beta[2,1]  0.58    0.11 2.47 -4.24 -1.04  0.69 2.23  5.33   485    1
beta[2,2]  0.07    0.11 2.48 -4.73 -1.56  0.21 1.66  4.92   486    1
beta[2,3] -0.22    0.11 2.47 -5.12 -1.80  0.00 1.37  4.54   478    1
beta[2,4] -0.14    0.11 2.47 -5.00 -1.78  0.06 1.45  4.49   481    1

Samples were drawn using NUTS(diag_e) at Tue Nov 10 21:42:29 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

So for example, what does is beta[1,1,] telling me?

Thanks in advance.

2 Likes

I think you actually should be leaving out a category, by e.g. fixing x_beta[i][1] = 0 for all i.

I discussed a similar issue recently at Two questions: ①Rejecting initial value but still sampling. ②regarding divergent transitions (note that categorical_logit and multinomial_logit actually include a softmax transformation). Feel free to ask for clarifications if it is hard to understand.

What seems to happen is that Stan is able to sample the posterior despite it being ill-behaved, but because the model has too many degrees of freedom, you don’t learn much about your indiviual parameters as witnessed by the very wide marginal posteriors. I would however expect that if you looked at samples of the transformed values from softmax(beta[1]'), the posterior would be reasonably narrow.

Best of luck with your model!