# Interpretation of multinomial regression coefficients

Hi all,

Thanks to those who assisted me in estimating a multinomial logit model. I am familiar with the interpretation of these coefficients in the conventional fashion in which a category of the outcome is left out. For this analysis, I unclear what the regression coefficients are conveying. The model is very simple, with a 4 category outcome and a dichotomous predictor (male/female). Here is the code

``````modelString <- "

data {
int <lower=2> K;   // This is 4, the number of outcomes categories
int <lower=0> N;
int <lower=1> D;  // This is the number of columns in the design                                     matrix: 2
int <lower = 1, upper = K> ASBR07A[N];
matrix[N, D] x;   // This will be N by 4 matrix of data
}

parameters {
matrix[D, K] beta;     // This is a 2 x 4 matrix of betas
}

transformed parameters {
matrix[N, K] x_beta = x * beta;   //  N x 2  * 2  x 4
}

model {
to_vector(beta) ~ normal(0, 5);

for (i in 1:N)
ASBR07A[i] ~ categorical_logit(x_beta[i]');
}

generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}
}

"
``````

and here is the output.

``````Inference for Stan model: 439376e657c9f05511f724d396ba31c0.
1 chains, each with iter=10000; warmup=5000; thin=10;
post-warmup draws per chain=500, total post-warmup draws=500.

mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
beta[1,1]  0.20    0.11 2.51 -4.21 -1.58  0.16 1.85  5.32   497    1
beta[1,2]  0.06    0.11 2.51 -4.32 -1.74  0.03 1.72  5.14   500    1
beta[1,3] -0.28    0.11 2.52 -4.79 -2.10 -0.31 1.37  4.99   501    1
beta[1,4] -0.27    0.11 2.51 -4.59 -2.08 -0.31 1.44  4.98   496    1
beta[2,1]  0.58    0.11 2.47 -4.24 -1.04  0.69 2.23  5.33   485    1
beta[2,2]  0.07    0.11 2.48 -4.73 -1.56  0.21 1.66  4.92   486    1
beta[2,3] -0.22    0.11 2.47 -5.12 -1.80  0.00 1.37  4.54   478    1
beta[2,4] -0.14    0.11 2.47 -5.00 -1.78  0.06 1.45  4.49   481    1

Samples were drawn using NUTS(diag_e) at Tue Nov 10 21:42:29 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
``````

So for example, what does is beta[1,1,] telling me?

I think you actually should be leaving out a category, by e.g. fixing `x_beta[i][1] = 0` for all `i`.
I discussed a similar issue recently at Two questions: ①Rejecting initial value but still sampling. ②regarding divergent transitions (note that `categorical_logit` and `multinomial_logit` actually include a `softmax` transformation). Feel free to ask for clarifications if it is hard to understand.
What seems to happen is that Stan is able to sample the posterior despite it being ill-behaved, but because the model has too many degrees of freedom, you don’t learn much about your indiviual parameters as witnessed by the very wide marginal posteriors. I would however expect that if you looked at samples of the transformed values from `softmax(beta[1]')`, the posterior would be reasonably narrow.