Question about output from multinomial logistic regression

Greetings, everyone,

I’m running a multinomial logistic regression. The outcome has four categories and there are three predictors (Male =1, a measure of the number of books in the home as a 4 point scale, and measure of motivation to read, also a 4 point scale. Here is the essential aspects of the code.

DOWELL <- as.factor(DOWELL)
Canadareg2 <- data.frame(Canadareg2)

n <- nrow(Canadareg2)
f <- as.formula("DOWELL ~ Male + booksHome + motivRead")
m <- model.matrix(f,Canadareg2)
data.list <- list(n=nrow(Canadareg2),
                  k=length(unique(Canadareg2[,1])),
                  d=ncol(m),x=m, Male=Male, booksHome=booksHome,
                  motivRead=motivRead,DOWELL=as.numeric(Canadareg2[,1]))

ReadMultiNom <- "
data {
int<lower = 2> k; // The variable has at least two categories
int<lower = 1> d; // number of predictors
int<lower = 0> n;
vector[n] Male;
vector[n] booksHome;
vector[n] motivRead;
int <lower=1, upper=k> DOWELL[n];
matrix[n, d] x;
}


parameters {
matrix[d, k] beta;
}

transformed parameters {
  matrix[n, k] x_beta= x * beta;
}

model {
  to_vector(beta) ~ normal(0,2); // vectorizes beta and assigns same prior
  for (i in 1:n) {
    DOWELL[i] ~ categorical_logit(x_beta[i]');
  }
}
 
generated quantities {
  int <lower=1, upper=k> DOWELL_rep[n];
  vector[n] log_lik;
  for (i in 1:n) {
    DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
    log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
  }
}
"
nChains = 4
nIter= 10000
thinSteps = 10
burnInSteps = floor(nIter/2)

DOWELL = data.list$DOWELL

MultiNomRegFit = stan(data=data.list,model_code=ReadMultiNom,
              chains=nChains,control = list(adapt_delta = 0.99),
              iter=nIter,warmup=burnInSteps,thin=thinSteps)

Everything runs beautifully and all convergence criterion are met. However, the output does not seem to show information for the Male variable. That is, it seems to be only for the two predictors that are 4 point categories. It would seem to me that each beta would have three elements, e.g. beta(1,1,1), beta(1,2,1), etc. Here is the output.

Inference for Stan model: 7be33c603bd35d82ad7f6b200ccee16f.
## 4 chains, each with iter=10000; warmup=5000; thin=10; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1,1] -2.07    0.03 1.05 -4.16 -2.80 -2.08 -1.36  0.00  1705    1
## beta[1,2]  1.16    0.03 1.04 -0.97  0.47  1.18  1.84  3.25  1652    1
## beta[1,3]  1.06    0.03 1.07 -1.03  0.36  1.06  1.76  3.15  1703    1
## beta[1,4] -0.01    0.03 1.07 -2.17 -0.70 -0.01  0.68  2.06  1657    1
## beta[2,1]  0.51    0.03 1.04 -1.52 -0.18  0.52  1.20  2.51  1618    1
## beta[2,2] -0.01    0.03 1.04 -2.08 -0.72 -0.02  0.67  2.07  1636    1
## beta[2,3] -0.31    0.03 1.03 -2.31 -1.00 -0.31  0.38  1.70  1617    1
## beta[2,4] -0.26    0.03 1.04 -2.28 -0.94 -0.26  0.44  1.72  1648    1
## beta[3,1]  0.26    0.03 1.01 -1.73 -0.40  0.27  0.95  2.17  1525    1
## beta[3,2]  0.03    0.03 1.01 -2.01 -0.65  0.05  0.70  1.93  1525    1
## beta[3,3]  0.02    0.03 1.01 -2.04 -0.64  0.04  0.70  1.95  1522    1
## beta[3,4] -0.30    0.03 1.01 -2.36 -0.99 -0.29  0.38  1.59  1528    1
## beta[4,1]  0.18    0.02 0.98 -1.70 -0.49  0.14  0.85  2.17  1561    1
## beta[4,2] -0.09    0.02 0.98 -1.98 -0.77 -0.14  0.58  1.89  1568    1
## beta[4,3] -0.12    0.02 0.98 -2.01 -0.80 -0.15  0.56  1.85  1566    1
## beta[4,4]  0.11    0.02 0.99 -1.79 -0.56  0.07  0.79  2.10  1559    1
## 
## Samples were drawn using NUTS(diag_e) at Tue Oct 18 09:58:48 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

I’m not quite sure what is going on, so any advice would be appreciated.

Thanks,

David

You haven’t declared any parameters to use as regression coefficients with Male and you haven’t included Male in your likelihood at any point, so there are aren’t any regression coefficient parameters for the output to show

1 Like

Hi Andrew,

Thanks for the quick reply. If I didn’t include Male as you say, then is the output that I am getting referring to the other two predictors? I just don’t see what I’m supposed to add to the code to get the Male category as well as the other two. By the way, I’m following the example in 1.6 Multi-logit regression | Stan User’s Guide.

Thanks

David

I might be misunderstanding the code but I think the coefficients for Male are beta[2, k]. The predictors x have d=4 columns: intercept, male, bookshome, motivread. So the second column of x, and the second row of beta are the data and coefficient for male.

1 Like

Thanks for your reply, stijn. Just so I understand. Beta[1, k] are the intercepts for each category of the outcome, beta[2,k] is the Male effect for each category, beta[3,k] is the booksHome effect for each k, and beta[4,k] is the MotivRead effect. for each k. That would make sense, given the layout in the as.formula statement. I’ll see if anyone else chimes in, but thanks.

David

1 Like

Bringing together what others have said… as @andrjohns mentioned, your current model expects Male in the data block but then never uses it in the later blocks. So it isn’t doing anything for your current model. You could just as well pass any arbitrary n-length vector and you would get the same results (assuming you keep d the same). As @stijn mentioned, the coefficient for Male should be beta[2, ], since model.matrix() adds a column of 1s for the intercept. This corresponds to beta[1, ]. You could simplify your code as below.

data {
    int<lower = 2> k; // The variable has at least two categories
    int<lower = 1> d; // number of predictors
    int<lower = 0> n;
    int <lower=1, upper=k> DOWELL[n];
    matrix[n, d] x;
}
parameters {
    matrix[d, k] beta;
}
transformed parameters {
    matrix[n, k] x_beta= x * beta;
}
model {
    to_vector(beta) ~ normal(0,2); // vectorizes beta and assigns same prior
    for (i in 1:n) {
        DOWELL[i] ~ categorical_logit(x_beta[i]');
    }
}
generated quantities {
    int <lower=1, upper=k> DOWELL_rep[n];
    vector[n] log_lik;
    for (i in 1:n) {
        DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
        log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
    }
}

You could instead use the vectors for each variable and use more specific parameter names for them. This might make it clearer for humans reading it, but it would be less flexible since d automatically resizes as you add or remove predictors.

data {
    int<lower = 2> k; // The variable has at least two categories
    int<lower = 0> n;
    vector[n] Male;
    vector[n] booksHome;
    vector[n] motivRead;
    int <lower=1, upper=k> DOWELL[n];
}
transformed data{
    vector[n] intercept = rep_vector(1, n);
}
parameters {
    row_vector[k] alpha;
    row_vector[k] beta_male;
    row_vector[k] beta_books;
    row_vector[k] beta_motiv;
}
transformed parameters {
    matrix[n, k] x_beta = intercept * alpha +
                          Male * beta_male +
                          booksHome * beta_books +
                          motivRead * beta_motiv;
}
model {
    alpha ~ normal(0, 2);
    beta_male ~ normal(0, 2);
    beta_books ~ normal(0, 2);
    beta_motiv ~ normal(0, 2);

    for (i in 1:n) {
        DOWELL[i] ~ categorical_logit(x_beta[i]');
    }
}
generated quantities {
    int <lower=1, upper=k> DOWELL_rep[n];
    vector[n] log_lik;
  
    for (i in 1:n) {
        DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
        log_lik[i] = categorical_logit_lpmf(DOWELL[i] | x_beta[i]');
    }
}
1 Like