Labeling and order of multinomial regression coefficients

Hi all,

I am estimating a multinomial regression model with a 4 category outcome variables. The predictors are

male:  (1=male)

ASBG04: # of books in home 1) 0-10, 2) 11-25, 3) 26-100, 4) 101-200 5) More than 200

ASBGSMR:  Students motivated to read (4pt, strongly agree to strongly disagree)

The model set up is as follows

N <- nrow(Canadareg1)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f,Canadareg1)

data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg1[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg1[,1]))


modelString <- "

data {
  int <lower=2> K;   // This is 4, the number of outcomes categories
  int <lower=0> N;            
  int <lower=1> D;  // This is the number of columns in the design                                     matrix: 4
  int <lower = 1, upper = K> ASBR07A[N];
  matrix[N, D] x;   // This will be N by 2 matrix of data
}

parameters {
  matrix[D, K] beta;     // This is a 4 x 4 matrix of betas
}

transformed parameters {
  matrix[N, K] x_beta = x * beta;   //  N x 4
}

model {
  to_vector(beta) ~ normal(0, 5);

  for (i in 1:N)
    ASBR07A[i] ~ categorical_logit(x_beta[i]');
}

generated quantities {
int<lower=1,upper=K> ASBR07A_rep[N];
for (i in 1:N){
  ASBR07A_rep[i] = categorical_logit_rng(x_beta[i]');
}
}

"

Everything runs find, and the output looks is as follows.

## Inference for Stan model: 87a49ecccc72c4a5df74a69624386dba.
## 1 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=500.
## 
##            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1,1] -2.33    0.25 2.26 -6.21 -3.99 -2.42 -0.80  2.06    83 1.00
## beta[1,2]  1.00    0.24 2.23 -3.01 -0.61  0.84  2.50  5.34    86 1.00
## beta[1,3]  0.90    0.25 2.26 -3.20 -0.79  0.85  2.44  5.04    84 1.00
## beta[1,4] -0.17    0.24 2.26 -4.18 -1.80 -0.17  1.28  4.33    87 1.00
## beta[2,1]  0.75    0.44 2.69 -4.85 -1.01  1.08  2.68  5.17    37 1.00
## beta[2,2]  0.50    0.44 2.68 -4.98 -1.26  0.87  2.45  5.00    37 1.00
## beta[2,3]  0.49    0.44 2.69 -5.14 -1.28  0.85  2.47  4.98    37 1.00
## beta[2,4]  0.17    0.45 2.69 -5.40 -1.61  0.57  2.15  4.65    36 1.00
## beta[3,1]  0.11    0.44 2.53 -5.02 -1.68  0.09  1.85  5.44    33 1.02
## beta[3,2] -0.17    0.44 2.53 -5.29 -1.96 -0.20  1.57  5.19    33 1.02
## beta[3,3] -0.19    0.44 2.53 -5.28 -1.98 -0.22  1.56  5.13    33 1.02
## beta[3,4]  0.03    0.44 2.53 -5.13 -1.74  0.02  1.76  5.38    33 1.02
## beta[4,1]  1.19    0.39 2.54 -3.81 -0.47  1.17  3.23  5.79    42 1.01
## beta[4,2]  0.65    0.39 2.53 -4.20 -1.12  0.60  2.67  5.22    42 1.01
## beta[4,3]  0.35    0.39 2.53 -4.49 -1.34  0.30  2.38  4.90    42 1.01
## beta[4,4]  0.41    0.39 2.54 -4.48 -1.29  0.33  2.40  5.07    42 1.00
## 
## Samples were drawn using NUTS(diag_e) at Mon Nov 16 21:06:41 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

As you can see, I am getting the results for ASBGSMR for each of the four categories, but not the others. I tried a different ordering of the predictors but receive the same results. I’m not sure what is going on here.

Thanks

This sounds like a problem that can be debugged at the design matrix level (without going into Stan).

I think if you just print the matrix M then it’ll give you a lot of info about what each column means. I suspect there is something surprising going on in the matrix encoding – maybe a group is being encoded as the intercept or something.

Well that indeed seems to be the case, but I don’t understand why. In following others’ with similar syntax, this is what I write.

```{r, echo=TRUE}
N <- nrow(Canadareg2)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f,Canadareg2)
data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg2[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg2[,1]))

When I print out M, this is what I see

> head(M)
   (Intercept) ASBG04  ASBGSMR male
58           1      3  7.93542    0
59           1      3 12.36856    0
61           1      2 12.36856    0
64           1      3 12.36856    1
68           1      3  8.31489    0
70           1      1  8.31489    1
> 

So, the outcome variable ASBR07A is replaced by an intercept. This is all before I get to Stan syntax. Thoughts?

Thanks

David

What does Canadareg2 look like? Does ASBG04 or ASBGSMR need converted to a factor?

Here is what Canadareg2 looks like

> head(Canadareg2)
   ASBR07A male ASBG04  ASBGSMR
58       1    0      3  7.93542
59       4    0      3 12.36856
61       4    0      2 12.36856
64       1    1      3 12.36856
68       1    0      3  8.31489
70       1    1      1  8.31489
> 

I did convert ASBG04 to a factor but it didn’t change anything.

Hmm, I think there is something going on here.

Check out this example:

df = data.frame(b = c(1, 2, 3, 4),
                a = as.factor(c(1, 1, 2, 3)))
X = model.matrix(b ~ a, df)

Printing X gives:

> X
  (Intercept) a2 a3
1           1  0  0
2           1  0  0
3           1  1  0
4           1  0  1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

That’s what you’re looking for here right? Like a bunch of columns corresponding to the different ASBG04 values?

Yup. Something is going on with model.matrix. It produces a design matrix which I don’t think one needs when running a multinomial regression. The relevant lines in my code that are producing this problem are

```{r, echo=TRUE}
N <- nrow(Canadareg2)
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f, Canadareg2)
data.list <- list(N=nrow(M), 
                  K=length(unique(Canadareg2[,1])),
                  D=ncol(M), 
                  x=M, 
                  ASBR07A=as.numeric(Canadareg2[,1]))

Can you save the Canadareg2 dataframe as a csv and attach it?

Here go. What is strange is that I don’t know where the first column is coming from as it is not in the data set. Canada2.csv (18.4 KB)

1 Like

This worked for me:

df = read.csv("Canada2.csv") %>%
  mutate(ASBG04 = as.factor(ASBG04))
f <- as.formula("ASBR07A ~ ASBG04 + ASBGSMR + male")
M <- model.matrix(f, df)

Giving:

> M
    (Intercept) ASBG042 ASBG043 ASBG044 ASBG045  ASBGSMR male
1             1       0       1       0       0  7.93542    0
2             1       0       1       0       0 12.36856    0
3             1       1       0       0       0 12.36856    0
4             1       0       1       0       0 12.36856    1
...

Thanks. Yes, but where did ASBR07A go? Also, ASBG04 should not be categorized into dummy variables for this example.

I don’t think model.matrix() includes the outcome variable (on the LHS of the formula), so that’s probably why.

If it’s coded as a factor then model.matrix() will always convert it to dummy variables.

If I’m not mistaken, I think both of these problems are solved by the model.frame() function instead of model.matrix(). If you try that instead does that give you what you wanted?

1 Like

Hi Jonah,

I am not sure your suggestion quite worked. If I use model.frame as follows

N <- nrow(Canadareg1)
f <- as.formula(" ~ male + ASBG04 + ASBGSMR")
M <- model.frame(f,Canadareg1)
M

then this is what M looks like

head(M)

      male     ASBG04. ASBGSMR
    <dbl>.       <int>.      <dbl>
58	0	3	7.93542	
59	0	3	12.36856	
61	0	2	12.36856	
64	1	3	12.36856	
68	0	3	8.31489	
70	1	1	8.31489	
72	1	3	7.93542	
73	1	3	10.35004	
77	1	4	8.78118	
78	1	2	8.31489	

and the output looks like this

Inference for Stan model: 8a55851b0cd2a1daa239d76f0d003136.
1 chains, each with iter=500; warmup=250; thin=1; 
post-warmup draws per chain=250, total post-warmup draws=250.

           mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
beta[1,1]  0.72    0.36 2.17 -3.28 -0.90  0.96 2.37  4.32    36 1.00
beta[1,2]  0.40    0.36 2.15 -3.70 -1.34  0.62 2.05  4.04    36 1.00
beta[1,3]  0.10    0.36 2.18 -3.75 -1.68  0.38 1.76  3.94    36 1.00
beta[1,4]  0.10    0.36 2.16 -3.83 -1.63  0.30 1.86  3.87    35 1.00
beta[2,1] -0.46    0.31 1.97 -4.02 -1.83 -0.64 0.93  3.70    40 1.07
beta[2,2] -0.42    0.31 1.96 -3.99 -1.81 -0.58 0.96  3.63    40 1.07
beta[2,3] -0.44    0.31 1.98 -4.03 -1.88 -0.59 0.99  3.63    40 1.07
beta[2,4] -0.85    0.31 1.97 -4.34 -2.23 -0.97 0.57  3.23    40 1.07
beta[3,1] -0.08    0.27 1.90 -3.74 -1.54 -0.17 1.31  3.16    51 1.00
beta[3,2] -0.13    0.27 1.90 -3.78 -1.59 -0.22 1.28  3.17    51 1.00
beta[3,3] -0.16    0.27 1.90 -3.80 -1.63 -0.23 1.25  3.09    51 1.00
beta[3,4] -0.01    0.27 1.90 -3.67 -1.46 -0.09 1.39  3.27    51 1.00

I don’t understand this output because I don’t have any variable with 3 categories, and I am missing Male and the continuous variable ASBGSMR.

Now if I add the outcome to the formula as follows, then M looks like this

N <- nrow(Canadareg1)
f <- as.formula("ASBR07A ~ male + ASBG04 + ASBGSMR")
M <- model.frame(f,Canadareg1)
M
ASBR07A.  male     ASBG04. ASBGSMR
   <int>                <dbl>.       <int>.      <dbl>
58 1 0 3 7.93542
59 4 0 3 12.36856
61 4 0 2 12.36856
64 1 1 3 12.36856
68 1 0 3 8.31489
70 1 1 1 8.31489
72 1 1 3 7.93542
73 1 1 3 10.35004
77 2 1 4 8.78118
78 1 1 2 8.31489

which as all of the data in it. The output of the analysis looks like this.

Inference for Stan model: 8a55851b0cd2a1daa239d76f0d003136.
1 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=500.

            mean se_mean   sd   2.5%    25%    50%    75%  97.5%
beta[1,1] -18.54    0.28 3.28 -25.15 -20.79 -18.57 -16.20 -12.34
beta[1,2]   0.54    0.32 2.76  -4.86  -1.19   0.68   2.39   6.08
beta[1,3]   7.44    0.31 2.79   1.91   5.57   7.56   9.28  12.90
beta[1,4]  10.39    0.31 2.79   4.86   8.48  10.59  12.22  15.75
beta[2,1]   1.50    0.21 2.58  -3.34  -0.18   1.45   3.19   6.51
beta[2,2]  -0.07    0.21 2.44  -4.75  -1.68  -0.05   1.52   4.43
beta[2,3]  -0.94    0.21 2.43  -5.51  -2.51  -0.93   0.75   3.64
beta[2,4]  -1.61    0.21 2.42  -6.37  -3.23  -1.58   0.06   3.05
beta[3,1]   2.16    0.22 2.49  -2.72   0.64   1.95   3.66   7.43
beta[3,2]   0.84    0.23 2.48  -3.56  -0.80   0.70   2.32   6.12
beta[3,3]  -0.60    0.23 2.47  -5.00  -2.26  -0.75   0.80   4.71
beta[3,4]  -1.86    0.23 2.47  -6.34  -3.46  -2.02  -0.42   3.46
beta[4,1]   2.44    0.19 2.50  -2.43   0.68   2.54   4.06   7.37
beta[4,2]   0.04    0.19 2.48  -4.78  -1.58   0.15   1.74   4.90
beta[4,3]  -1.30    0.19 2.47  -6.10  -2.91  -1.23   0.47   3.61
beta[4,4]  -1.92    0.19 2.48  -6.80  -3.58  -1.81  -0.20   2.89
          n_eff Rhat
beta[1,1]   133 1.01
beta[1,2]    77 1.02
beta[1,3]    82 1.01
beta[1,4]    83 1.01
beta[2,1]   148 1.00
beta[2,2]   134 1.00
beta[2,3]   133 1.00
beta[2,4]   135 1.00
beta[3,1]   129 1.00
beta[3,2]   117 1.00
beta[3,3]   118 1.00
beta[3,4]   117 1.00
beta[4,1]   174 1.00
beta[4,2]   172 1.00
beta[4,3]   172 1.00
beta[4,4]   172 1.00

The labeling of the betas looks different, I don’t know where Male or ASBGSMR went.

Thanks,

David

In the Stan code beta is declared as matrix[D,K] and in your R code D is computed as the number of columns in the data M. In this case it looks like the M you showed had 3 columns so that’s why beta would get 3 rows. In the second case your M had 4 columns so your beta was had 4 rows. So that’s all happening just because of the way beta is declared in the Stan code and the way the R code (from the initial post) is computing the dimensions. I’m not saying it’s what you want, but it’s behaving as I would expect based on the code.

But I think I might be confused about what you actually want. What would the columns in the “correct” version of M be? What would the dimensions of beta be if all were going properly? I think if I can clearly understand that then we’ll be able to figure this out.