Interpreting multinomial logit coefficients

Hi all,

I’m wondering how to interpret the coefficients of a multinomial logit regression and if there is a better way to parameterize the model to make it more interpretable. I have a 4-category outcome variable. I have three predictors: a dichotomous gender indicator (1=male), a 5-category variable that defines ranges for the number of books in the home, and a continuous scale of motivation to read. The code follows and runs just fine with no problems.

# Read in relevant packages #

library(rstan)
library(bayesplot)
library(magrittr)
library(dplyr)
library(loo)
options(mc.cores = 4)

# male:  (1=male)

# booksHome: # of books in home 1) 0-10, 2) 11-25, 3) 26-100, 4) 101-200 5) More than 200

# ASBGSMR:  Students motivated to read (4pt, strongly agree to strongly disagree)

# ASBR07A:  I do well in reading  (4pt, strongly agree to strongly disagree)

# Some recoding to reduce the sample size, rename variables, remove missing data (listwise) and 
# recode sex

Canadareg <-read.csv("~/desktop/Canada.csv",header=TRUE)
Canadareg <- Canadareg[1:5000,]  # Just to cut down the sample size.
Canadareg[Canadareg==999999]=NA
Canadareg <- na.omit(Canadareg)
Male <- ifelse(Canadareg$ITSEX==2,0,1)
Canadareg1 <- cbind(Canadareg,Male)
Canadareg2 <- Canadareg1 %>% 
  select(ASBR07A, Male, ASBG04,ASBGSMR) %>%
  rename(DOWELL = ASBR07A,
         Male=Male,
         booksHome = ASBG04,
         motivRead = ASBGSMR)

attach(Canadareg2)
DOWELL <- as.factor(DOWELL)
Canadareg2 <- data.frame(Canadareg2)

n <- nrow(Canadareg2)
f <- as.formula("DOWELL ~ Male + booksHome + motivRead")
m <- model.matrix(f,Canadareg2)
data.list <- list(n=nrow(m), 
                  k=length(unique(Canadareg2[,1])),
                  d=ncol(m),
                  x=m,
                  DOWELL=as.numeric(Canadareg2[,1]))

modelString <- "
data {
  int<lower = 2> k;
  int<lower = 0> n;
  int<lower = 1> d; 
  int <lower=1, upper=k> DOWELL[n];
  matrix[n, d] x;
}
parameters {
  matrix[d, k] beta;
}
transformed parameters {
  matrix[n, k] x_beta= x * beta;
}
model {
  to_vector(beta) ~ normal (0,2);
  for (i in 1:n) {
  DOWELL[i] ~ categorical_logit(x_beta[i]');
  }
}
 generated quantities {
  int <lower=1, upper=k> DOWELL_rep[n];
  vector[n] log_lik;
  for (i in 1:n) {
    DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
    log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
  }
}
"
# Start estimation
nChains = 4
nIter= 10000
thinSteps = 10
burnInSteps = floor(nIter/2)

DOWELL = data.list$DOWELL

MultiNomRegFit = stan(data=data.list,model_code=modelString,
              chains=nChains,control = list(adapt_delta = 0.99),
              iter=nIter,warmup=burnInSteps,thin=thinSteps)

# Stan plots

stan_plot(MultiNomRegFit, pars=c("beta"))
stan_trace(MultiNomRegFit,inc_warmup=T, pars=c("beta"))
stan_dens(MultiNomRegFit, pars=c("beta"))
stan_ac(MultiNomRegFit, pars=c("beta"))

# Stan output

print(MultiNomRegFit,pars=c("beta"))

# Posterior predictive checks

DOWELL_rep = rstan::extract(MultiNomRegFit,"DOWELL_rep ")$DOWELL_rep
ppc_bars(y=DOWELL, yrep= DOWELL_rep)

log_lik1 = loo::extract_log_lik(MultiNomRegFit, merge_chains = FALSE)
loo_catLogit_1 = loo(log_lik1)
print(loo_catLogit_1)

The output is as follows

Parameter Mean SEMean SD 95% PPI ESS rhat
beta[1,1]  -2.14 0.02 1.04 -4.16– -0.11 1867 1
beta[1,2]  1.09 0.02 1.04 -0.96– 3.16 1836 1
beta[1,3]  0.98 0.02 1.04 -1.08 – 3.09 1883 1
beta[1,4]  -0.07 0.02 1.06 -2.11 – 1.98 1865 1
beta[2,1]  0.54 0.03 1.04 -1.46 – 2.51 1665 1
beta[2,2]  0.02 0.03 1.04 -1.96 – 2.02 1706 1
beta[2,3] -0.28 0.03 1.05 -2.28 – 1.73 1700 1
beta[2,4] -0.23 0.03 1.05 -2.24 – 1.78 1670 1
beta[3,1] 0.19 0.02 0.99 -1.72 – 2.11 1880 1
beta[3,2] -0.04 0.02 0.99 -1.99 – 1.88 1873 1
beta[3,3] -0.05 0.02 0.99 -1.96 – 1.88 1896 1
beta[3,4] -0.38 0.02 0.99 -2.30 – 1.51 1865 1
beta[4,1] 0.17 0.02 1.01 -1.80 – 2.15 1774 1
beta[4,2] -0.10 0.02 1.01 -2.06 – 1.88 1767 1
beta[4,3] -0.12 0.02 1.01 -2.12 – 1.85 1770 1
beta[4,4] 0.10 0.02 1.01 -1.85 – 2.06 1773 1

I understand the meaning of betas in multinomial logit regression, but in this case I am unsure what the betas indices (e.g. beta[1,1] and so on) are referring to. Is there a better way to parameterize this model so that the betas are more meaningful.

Thanks

David

First, I’d think about rearranging x and beta so you don’t have to transpose in the density. It’d be even better to collect an array of the x_beta[i] and of DOWELL and then vectorize the whole thing. But that’s just efficiency, not interpretability.

It sounds like you have discrete predictors, but you seem to be using something like a continuous encoding of them in x. The usual way to code your predictors for regression would be this way:

data {
  int<lower=0> N;  // observations
  // predictors
  array[N] int<lower=0, upper=1> male;  // assuming 1 if male, 0 if female
  array[N] int<lower= 1, upper=5> books;
  vector[N] motivation;

and then the parameters look like this:

parameters {
  vector[4] alpha;
  vector[4] beta_sex;
  array[5] vector[4] beta_books;
  vector[4] beta_motivation;
  array[N] int<lower=1, upper=4> dowell;

and then the likelihood looks like this:

for (n in 1:N)
  dowell[n] ~ categorical_logit(alpha + beta_sex * sex[n] + beta_books[books[n]] + beta_motivation * x[n]);

There are lots of choices here for parameterizations when you get into multiple outcomes. For more efficiency, you can pack the first three terms into a matrix indexed by sex and books.

Now the alpha is an intercept, beta_sex the male effect (female rolled into intercept), and beta_books[j] the factors for having j units of books, and so on.

The problem is that categorical_logit is only identified up to an additive constant. You can handle this by pinning one of the categories to 0 and having everything else be relative, or you can let the priors sort it out. Or you can set the coefficients for the random effects so that they sum to 0 and everything else gets pushed into the intercept.

1 Like

Thanks, Bob,

So, I’m trying reconcile the code you sent me with the discussion in the Stan Users guide
1.6 Multi-logit regression | Stan User’s Guide on identifiability in the multinomial logit regression. The code I’m trying is this

modelString <- "
data {
  int<lower = 2> k;
  int<lower = 0> n;
  int<lower = 1> d; 
  int <lower=1, upper=k> DOWELL[n];
  matrix[n, d] x;
}
transformed data {
  row_vector[d] zeros = rep_row_vector(0, d);
}
parameters {
  matrix[k - 1, d] beta_raw;
}
transformed parameters {
  matrix[k, d] beta;
  beta = append_row(beta_raw, zeros);
}
model {
  to_vector(beta) ~ normal (0,2);
  for (i in 1:n) {
  DOWELL[i] ~ categorical_logit(x_beta[i]');
  }
}
 
generated quantities {
  int <lower=1, upper=k> DOWELL_rep[n];
  vector[n] log_lik;
  for (i in 1:n) {
    DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
    log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
  }
}
"

The argument for the categorical_logit call is incorrect, an throwing an error message that x_beta[i]’ doesn’t exist. I’m not sure what I put into the argument.

David