Labeling and order of multinomial regression coefficients

Hi all,

I am estimating a multinomial regression model with a 4 category outcome variables. The predictors are

male:  (1=male)

ASBG04: # of books in home 1) 0-10, 2) 11-25, 3) 26-100, 4) 101-200 5) More than 200

ASBGSMR:  Students motivated to read (4pt, strongly agree to strongly disagree)

The model set up is as follows

N <- nrow(Canadareg1)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f,Canadareg1)

data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg1[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg1[,1]))


modelString <- "

data {
  int <lower=2> K;   // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;  // This is the number of columns in the design                                     matrix: 4
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;   // This will be N by 2 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 4 x 4 matrix of betas
}

transformed parameters {
  matrix[N, K] x_beta = x * beta;   //  N x 4
}

model {
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}

generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
  ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}
}

"

Everything runs find, and the output looks is as follows.

## Inference for Stan model: 87a49ecccc72c4a5df74a69624386dba.
## 1 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=500.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1,1] -2.33    0.25 2.26 -6.21 -3.99 -2.42 -0.80  2.06    83 1.00
## beta[1,2]  1.00    0.24 2.23 -3.01 -0.61  0.84  2.50  5.34    86 1.00
## beta[1,3]  0.90    0.25 2.26 -3.20 -0.79  0.85  2.44  5.04    84 1.00
## beta[1,4] -0.17    0.24 2.26 -4.18 -1.80 -0.17  1.28  4.33    87 1.00
## beta[2,1]  0.75    0.44 2.69 -4.85 -1.01  1.08  2.68  5.17    37 1.00
## beta[2,2]  0.50    0.44 2.68 -4.98 -1.26  0.87  2.45  5.00    37 1.00
## beta[2,3]  0.49    0.44 2.69 -5.14 -1.28  0.85  2.47  4.98    37 1.00
## beta[2,4]  0.17    0.45 2.69 -5.40 -1.61  0.57  2.15  4.65    36 1.00
## beta[3,1]  0.11    0.44 2.53 -5.02 -1.68  0.09  1.85  5.44    33 1.02
## beta[3,2] -0.17    0.44 2.53 -5.29 -1.96 -0.20  1.57  5.19    33 1.02
## beta[3,3] -0.19    0.44 2.53 -5.28 -1.98 -0.22  1.56  5.13    33 1.02
## beta[3,4]  0.03    0.44 2.53 -5.13 -1.74  0.02  1.76  5.38    33 1.02
## beta[4,1]  1.19    0.39 2.54 -3.81 -0.47  1.17  3.23  5.79    42 1.01
## beta[4,2]  0.65    0.39 2.53 -4.20 -1.12  0.60  2.67  5.22    42 1.01
## beta[4,3]  0.35    0.39 2.53 -4.49 -1.34  0.30  2.38  4.90    42 1.01
## beta[4,4]  0.41    0.39 2.54 -4.48 -1.29  0.33  2.40  5.07    42 1.00
## 
## Samples were drawn using NUTS(diag_e) at Mon Nov 16 21:06:41 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

As you can see, I am getting the results for ASBGSMR for each of the four categories, but not the others. I tried a different ordering of the predictors but receive the same results. I’m not sure what is going on here.

Thanks

This sounds like a problem that can be debugged at the design matrix level (without going into Stan).

I think if you just print the matrix M then it’ll give you a lot of info about what each column means. I suspect there is something surprising going on in the matrix encoding – maybe a group is being encoded as the intercept or something.

Well that indeed seems to be the case, but I don’t understand why. In following others’ with similar syntax, this is what I write.

```{r, echo=TRUE}
N <- nrow(Canadareg2)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f,Canadareg2)
data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg2[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg2[,1]))

When I print out M, this is what I see

> head(M)
   (Intercept) ASBG04  ASBGSMR male
58           1      3  7.93542    0
59           1      3 12.36856    0
61           1      2 12.36856    0
64           1      3 12.36856    1
68           1      3  8.31489    0
70           1      1  8.31489    1
> 

So, the outcome variable ASBR07A is replaced by an intercept. This is all before I get to Stan syntax. Thoughts?

Thanks

David

What does Canadareg2 look like? Does ASBG04 or ASBGSMR need converted to a factor?

Here is what Canadareg2 looks like

> head(Canadareg2)
   ASBR07A male ASBG04  ASBGSMR
58       1    0      3  7.93542
59       4    0      3 12.36856
61       4    0      2 12.36856
64       1    1      3 12.36856
68       1    0      3  8.31489
70       1    1      1  8.31489
> 

I did convert ASBG04 to a factor but it didn’t change anything.

Hmm, I think there is something going on here.

Check out this example:

df = data.frame(b = c(1, 2, 3, 4),
                a = as.factor(c(1, 1, 2, 3)))
X = model.matrix(b ~ a, df)

Printing X gives:

> X
  (Intercept) a2 a3
1           1  0  0
2           1  0  0
3           1  1  0
4           1  0  1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

That’s what you’re looking for here right? Like a bunch of columns corresponding to the different ASBG04 values?

Yup. Something is going on with model.matrix. It produces a design matrix which I don’t think one needs when running a multinomial regression. The relevant lines in my code that are producing this problem are

```{r, echo=TRUE}
N <- nrow(Canadareg2)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f, Canadareg2)
data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg2[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg2[,1]))

Can you save the Canadareg2 dataframe as a csv and attach it?

Here go. What is strange is that I don’t know where the first column is coming from as it is not in the data set. Canada2.csv (18.4 KB)

1 Like

This worked for me:

df = read.csv("Canada2.csv") %>%
  mutate(ASBG04 = as.factor(ASBG04))
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f, df)

Giving:

> M
    (Intercept) ASBG042 ASBG043 ASBG044 ASBG045  ASBGSMR male
1             1       0       1       0       0  7.93542    0
2             1       0       1       0       0 12.36856    0
3             1       1       0       0       0 12.36856    0
4             1       0       1       0       0 12.36856    1
...

Thanks. Yes, but where did ASBR07A go? Also, ASBG04 should not be categorized into dummy variables for this example.

I don’t think model.matrix() includes the outcome variable (on the LHS of the formula), so that’s probably why.

If it’s coded as a factor then model.matrix() will always convert it to dummy variables.

If I’m not mistaken, I think both of these problems are solved by the model.frame() function instead of model.matrix(). If you try that instead does that give you what you wanted?

1 Like

Hi Jonah,

I am not sure your suggestion quite worked. If I use model.frame as follows

N <- nrow(Canadareg1)
f <- as.formula(" ~ male + ASBG04 + ASBGSMR")
M <- model.frame(f,Canadareg1)
M

then this is what M looks like

head(M)

      male     ASBG04. ASBGSMR
    <dbl>.       <int>.      <dbl>
58	0	3	7.93542	
59	0	3	12.36856	
61	0	2	12.36856	
64	1	3	12.36856	
68	0	3	8.31489	
70	1	1	8.31489	
72	1	3	7.93542	
73	1	3	10.35004	
77	1	4	8.78118	
78	1	2	8.31489	

and the output looks like this

Inference for Stan model: 8a55851b0cd2a1daa239d76f0d003136.
1 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=250.

           mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
beta[1,1]  0.72    0.36 2.17 -3.28 -0.90  0.96 2.37  4.32    36 1.00
beta[1,2]  0.40    0.36 2.15 -3.70 -1.34  0.62 2.05  4.04    36 1.00
beta[1,3]  0.10    0.36 2.18 -3.75 -1.68  0.38 1.76  3.94    36 1.00
beta[1,4]  0.10    0.36 2.16 -3.83 -1.63  0.30 1.86  3.87    35 1.00
beta[2,1] -0.46    0.31 1.97 -4.02 -1.83 -0.64 0.93  3.70    40 1.07
beta[2,2] -0.42    0.31 1.96 -3.99 -1.81 -0.58 0.96  3.63    40 1.07
beta[2,3] -0.44    0.31 1.98 -4.03 -1.88 -0.59 0.99  3.63    40 1.07
beta[2,4] -0.85    0.31 1.97 -4.34 -2.23 -0.97 0.57  3.23    40 1.07
beta[3,1] -0.08    0.27 1.90 -3.74 -1.54 -0.17 1.31  3.16    51 1.00
beta[3,2] -0.13    0.27 1.90 -3.78 -1.59 -0.22 1.28  3.17    51 1.00
beta[3,3] -0.16    0.27 1.90 -3.80 -1.63 -0.23 1.25  3.09    51 1.00
beta[3,4] -0.01    0.27 1.90 -3.67 -1.46 -0.09 1.39  3.27    51 1.00

I don’t understand this output because I don’t have any variable with 3 categories, and I am missing Male and the continuous variable ASBGSMR.

Now if I add the outcome to the formula as follows, then M looks like this

N <- nrow(Canadareg1)
f <- as.formula("ASBR07A ~ male + ASBG04 + ASBGSMR")
M <- model.frame(f,Canadareg1)
M
ASBR07A.  male     ASBG04. ASBGSMR
   <int>                <dbl>.       <int>.      <dbl>
58 1 0 3 7.93542
59 4 0 3 12.36856
61 4 0 2 12.36856
64 1 1 3 12.36856
68 1 0 3 8.31489
70 1 1 1 8.31489
72 1 1 3 7.93542
73 1 1 3 10.35004
77 2 1 4 8.78118
78 1 1 2 8.31489

which as all of the data in it. The output of the analysis looks like this.

Inference for Stan model: 8a55851b0cd2a1daa239d76f0d003136.
1 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=500.

            mean se_mean   sd   2.5%    25%    50%    75%  97.5%
beta[1,1] -18.54    0.28 3.28 -25.15 -20.79 -18.57 -16.20 -12.34
beta[1,2]   0.54    0.32 2.76  -4.86  -1.19   0.68   2.39   6.08
beta[1,3]   7.44    0.31 2.79   1.91   5.57   7.56   9.28  12.90
beta[1,4]  10.39    0.31 2.79   4.86   8.48  10.59  12.22  15.75
beta[2,1]   1.50    0.21 2.58  -3.34  -0.18   1.45   3.19   6.51
beta[2,2]  -0.07    0.21 2.44  -4.75  -1.68  -0.05   1.52   4.43
beta[2,3]  -0.94    0.21 2.43  -5.51  -2.51  -0.93   0.75   3.64
beta[2,4]  -1.61    0.21 2.42  -6.37  -3.23  -1.58   0.06   3.05
beta[3,1]   2.16    0.22 2.49  -2.72   0.64   1.95   3.66   7.43
beta[3,2]   0.84    0.23 2.48  -3.56  -0.80   0.70   2.32   6.12
beta[3,3]  -0.60    0.23 2.47  -5.00  -2.26  -0.75   0.80   4.71
beta[3,4]  -1.86    0.23 2.47  -6.34  -3.46  -2.02  -0.42   3.46
beta[4,1]   2.44    0.19 2.50  -2.43   0.68   2.54   4.06   7.37
beta[4,2]   0.04    0.19 2.48  -4.78  -1.58   0.15   1.74   4.90
beta[4,3]  -1.30    0.19 2.47  -6.10  -2.91  -1.23   0.47   3.61
beta[4,4]  -1.92    0.19 2.48  -6.80  -3.58  -1.81  -0.20   2.89
          n_eff Rhat
beta[1,1]   133 1.01
beta[1,2]    77 1.02
beta[1,3]    82 1.01
beta[1,4]    83 1.01
beta[2,1]   148 1.00
beta[2,2]   134 1.00
beta[2,3]   133 1.00
beta[2,4]   135 1.00
beta[3,1]   129 1.00
beta[3,2]   117 1.00
beta[3,3]   118 1.00
beta[3,4]   117 1.00
beta[4,1]   174 1.00
beta[4,2]   172 1.00
beta[4,3]   172 1.00
beta[4,4]   172 1.00

The labeling of the betas looks different, I don’t know where Male or ASBGSMR went.

Thanks,

David

In the Stan code beta is declared as matrix[D,K] and in your R code D is computed as the number of columns in the data M. In this case it looks like the M you showed had 3 columns so that’s why beta would get 3 rows. In the second case your M had 4 columns so your beta was had 4 rows. So that’s all happening just because of the way beta is declared in the Stan code and the way the R code (from the initial post) is computing the dimensions. I’m not saying it’s what you want, but it’s behaving as I would expect based on the code.

But I think I might be confused about what you actually want. What would the columns in the “correct” version of M be? What would the dimensions of beta be if all were going properly? I think if I can clearly understand that then we’ll be able to figure this out.

Ok. I think that I might have figured this out. If I have three columns in M (male, ASBG04, and ASBGSMR, then beta[1,1] is the effect of male on the first category of the outcome, beta[1,2] is the effect of male on the second category, etc. Now, beta[2,1] is the effect of ASBG04 on the first category, and beta[2,2] is the effect of the ASBG04 on the second category of the outcome, and so on. Not the nicest way to display the output, but I think that’s it. Would you agree?

Yeah the way the model at the beginning is written, beta[2, 1] is the coefficient relating the second input column to the first outcome.

Depending on how your inputs and outputs are ordered those things will be interpreted a bit differently. You’ll have to carry that information around with you from the design matrix and the input data organization.

Yeah, unless you code your Stan program to refer to variables by name you’ll have to keep track of this information. There’s a tradeoff between generalizability and interpretability. Using beta means that you could use the same Stan code with different data and the interpretation of beta would change but you don’t need to change the Stan code. If you instead declare separate parameters for the coefficients and name them with the data variable names (e.g., beta_male, beta_ ASBG04 , etc.) then your parameters are easier to interpret in this particular case but you can’t use the same Stan program with other data because the names won’t make sense. You can code it either way but both ways have pros and cons.

We’re always looking to make improvements. Do you have a recommendation for a better way to display the output?

Jonah, the problem is that I haven’t come across Stan code for multinomial regression that allows declaring separate parameter values. When I search around what shows up is the code that I provided. Perhaps the format of the output using separate parameter declarations would be easy enough to interpret. Do you have examples of that type of set-up for multinomial regression?

Yeah it’s tough without examples. I’m not sure if there are examples of doing this explicitly for multinomial regression, but the technique I’m referring to is the same regardless of the type of model. What I mean is just anytime you have a matrix of parameters you can always instead declare separate column or row vectors that you name individually and then (if you want) recombine them to create the original matrix.

So something like this:

parameters {
  // matrix[D, K] beta;   
  row_vector[K] beta_1;  // but use more informative names (e.g. beta_ASBGSMR)  
  row_vector[K] beta_2; 
  row_vector[K] beta_3;
  // keep going for however many betas you have
}
transformed parameters {
  matrix[D,K] = [beta_1, beta_2, beta_3] ;  // recreate beta by combining the row_vectors
  matrix[N, K] x_beta = x * beta;   //  you can keep this here but I would move it to the top of the model block so it's not saved in the output
}
model {
  // you can now put priors on the individual beta row_vectors instead of to_vector(beta)
}

I haven’t tested this particular code out but this general technique should work. Hope that helps!

2 Likes