# Use categorical /multinomial family with binary-ish data based on separate column in brms

{brms} supports multinomial models with the `categorical()` family, and it works well as long as the response variable has 3+ categories in it.

The {mlogit} R package allows you to fit multinomial logitistic models in a frequentist way, and if you feed it data that is structured in a specific way, it can fit a multinomial model using a binary 0/1 outcome

Here’s an example (from this earlier related question)

``````library(dplyr)
library(brms)
library(mlogit)

colClasses = c(seat = "factor", price = "factor", choice="integer")) %>%
mutate(eng = factor(eng, levels = c("gas", "hyb", "elec")))
``````

Here, `resp.id` refers to 200 different respondents, `ques` referrs to 15 different random sets of questions offered to each respondent, `alt` refers to the different options in each set of questions. The `choice` column indicates which `alt` was selected. The other columns are different characteristics of the options being presented.

``````head(cbc_df)
#>   resp.id ques alt carpool seat cargo  eng price choice
#> 1       1    1   1     yes    6   2ft  gas    35      0
#> 2       1    1   2     yes    8   3ft  hyb    30      0
#> 3       1    1   3     yes    6   3ft  gas    30      1
#> 4       1    2   1     yes    6   2ft  gas    30      0
#> 5       1    2   2     yes    7   3ft  gas    35      1
#> 6       1    2   3     yes    6   2ft elec    35      0
``````

With `mlogit()`, you use the binary `choice` column as the response variable, but because of some adjustments to the dataset with `dfidx()`, `mlogit()` instead models the choice of `alt` = 1, `alt` = 2, or `alt` = 3 within each question. You can tell it’s doing multinomial work because it finds that alts 1, 2, and 3 were selected 33% of the time (to be expected, since the options were all shuffled every time)

``````# Modify the data for mlogit()
cbc_df_idx <- cbc_df %>%
# mlogit() needs a column with unique question id numbers
group_by(resp.id, ques) %>%
mutate(choice.id = cur_group_id()) %>%
ungroup() %>%
# Make indexed data frame for mlogit
dfidx(
idx = list(c("choice.id", "resp.id"), "alt"),
choice = "choice",
shape = "long"
)

m1 <- mlogit(choice ~ 0 + seat + cargo + eng + price, data = cbc_df_idx)
summary(m1)
#> Call:
#> mlogit(formula = choice ~ 0 + seat + cargo + eng + price, data = #> cbc_df_idx,
#>     method = "nr")
#>
#> Frequencies of alternatives:choice
#>       1       2       3
#> 0.32700 0.33467 0.33833
#>
#> ...
#>
#> Coefficients :
#>           Estimate Std. Error  z-value  Pr(>|z|)
#> seat7    -0.535280   0.062360  -8.5837 < 2.2e-16 ***
#> seat8    -0.305840   0.061129  -5.0032 5.638e-07 ***
#> cargo3ft  0.477449   0.050888   9.3824 < 2.2e-16 ***
#> enghyb   -0.811282   0.060130 -13.4921 < 2.2e-16 ***
#> engelec  -1.530762   0.067456 -22.6926 < 2.2e-16 ***
#> price35  -0.913656   0.060601 -15.0765 < 2.2e-16 ***
#> price40  -1.725851   0.069631 -24.7856 < 2.2e-16 ***
#> ...
``````

Translating this to {brms} seems like it should be relatively straightforward (again referencing this earlier question):

``````m2 <- brm(
bf(choice ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
data = cbc_df,
family = categorical(),
prior = c(
prior(normal(0, 3), class = b, dpar = mu1),
prior(exponential(1), class = sd, dpar = mu1)
),
chains = 4, cores = 4, iter = 2000, seed = 1234,
)
``````

The model fits just fine and the coefficients are all roughly comparable to the output from `mlogit()`, but I get this message:

``````Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.
``````

Which is true. The `choice` column here only contains 0/1 valies and it’s not technicaly multinomial.

Is there some way to get {brms} to somehow treat the response variable as multinomial with alternatives 1, 2, 3, similar to {mlogit}?

I’ve tried a few possible methods for this:

• Use the `alt` column as the response, like `bf(alt ~ 0 + seat + ...)`. But this isn’t right because it doesn’t include anything about which alternative was actually chosen
• Create a new column called `choice_alt` that is `alt * choice` - if someone selected alternative 3, this value would be 3; if someone selected alternative 2, this value would be 2; and so on. The value is 0 if no choice was made. This works-ish, but it creates four categories (0, 1, 2, and 3) instead of 3
• Somehow pass a three-column matrix of dummy-encoded columns to brms? Like `bf(choice_matrix ~ 0 + seat + ...)`. That doesn’t work, though, because brms isn’t designed for that kind of input data
• Somehow build the formula using nonlinear syntax (via this GitHub issue). This is probably the correct approach though how to implement it eludes me.
• Somehow use multivariate models and subsets? I tried this, but got an error about “Cannot use the same response variable twice in the same model”
``````option1 <- bf(choice | subset(alt == 1) ~ 0 + seat + cargo + eng + price + (1 | resp.id))
option2 <- bf(choice | subset(alt == 2) ~ 0 + seat + cargo + eng + price + (1 | resp.id))
option3 <- bf(choice | subset(alt == 3) ~ 0 + seat + cargo + eng + price + (1 | resp.id))

brm(
option1 + option2 + option3,
data = cbc_df,
family = categorical(),
...
)
``````

Thanks!

The categorical family in brms assumes the response is a univariate vector, so if you are passing in data in the form of a matrix, you probably want to use the multinomial family instead. That said, it seems like what you want is some kind of nested logit model rather than a multinomial count so the non-linear syntax is probably the way to go, but it might also be worth looking into the logit normal family that can handle multivariate binary responses.

As an aside, I ran into the issue about a year ago and if I recall, comparing mlogit to brms is a really bad idea because they are doing fundamentally different things in the backend (as in different models entirely, not just one being the bayesian variant).

1 Like

Somehow pass a three-column matrix of dummy-encoded columns to brms? Like `bf(choice_matrix ~ 0 + seat + ...)`. That doesn’t work, though, because brms isn’t designed for that kind of input data

This should work if you use family = multinomial(). You would need an outcome matrix that is N × 3. It looks like you only have 1 trial, so you’d have 1 trial per row.

See here for an example of multinomial data.

Hope that helps with this specific problem. Also, discrete choice models suck 😀

1 Like

Ooh this is super promising!

brms doesn’t seem happy about forcing one trial per row, though, in cases where there was no choice

For example,

``````cbc_df_with_matrix <- cbc_df %>%
mutate(choice_alt = alt * choice) %>%
mutate(
alt1 = ifelse(choice_alt == 1, 1, 0),
alt2 = ifelse(choice_alt == 2, 1, 0),
alt3 = ifelse(choice_alt == 3, 1, 0),
) %>%
mutate(outcome = cbind(alt1, alt2, alt3)) %>%
mutate(size = 1)
#> # A tibble: 9,000 × 16
#>    resp.id  ques   alt ... choice ... outcome[,"alt1"] [,"alt2"] [,"alt3"]  size
#>      <dbl> <dbl> <dbl> ...  <dbl> ...            <dbl>     <dbl>     <dbl> <dbl>
#>  1       1     1     1 ...      0 ...                0         0         0     1
#>  2       1     1     2 ...      0 ...                0         0         0     1
#>  3       1     1     3 ...      1 ...                0         0         1     1
#>  4       1     2     1 ...      0 ...                0         0         0     1
#>  5       1     2     2 ...      1 ...                0         1         0     1
#>  6       1     2     3 ...      0 ...                0         0         0     1
``````

That creates a matrix column named `outcome` with three columns in it. Running this model:

``````brm(
bf(outcome | trials(size) ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
data = cbc_df_with_matrix,
family = multinomial(refcat = "alt1")
)
``````

gives an error: “Error: Number of trials does not match the number of events.” That’s because in lots of those rows, no selection was ever made (like in rows 1 and 2, for instance—there’s 0 for all three outcomes)

Instead of forcing size to be 1, I can add the number of outcomes in each row:

``````mutate(size = alt1 + alt2 + alt3)
``````

That makes the size either 0 or 1:

``````#> # A tibble: 9,000 × 16
#>    resp.id  ques   alt ... choice ... outcome[,"alt1"] [,"alt2"] [,"alt3"]  size
#>      <dbl> <dbl> <dbl> ...  <dbl> ...            <dbl>     <dbl>     <dbl> <dbl>
#>  1       1     1     1 ...      0 ...                0         0         0     0
#>  2       1     1     2 ...      0 ...                0         0         0     0
#>  3       1     1     3 ...      1 ...                0         0         1     1
#>  4       1     2     1 ...      0 ...                0         0         0     0
#>  5       1     2     2 ...      1 ...                0         1         0     1
#>  6       1     2     3 ...      0 ...                0         0         0     0
``````

and the model fits!

``````brm(
bf(outcome | trials(size) ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
data = cbc_df_with_matrix,
family = multinomial(refcat = "alt1")
)
``````

But the coefficients are wildly off from what I get when using family = bernoulli or with mlogit (which are the same). And I think that’s because all the rows with size = 0 are essentially getting dropped?

Yeah, nested logit feels more logical to me—even just using regular logit with a random intercept for questions embedded in respondents (i.e. `(1 | resp.id / ques)`) seems to work and gives the same results as the marketing textbook example I’m copying,

lol just kidding, the coefficients are wildly off because it’s predicting choice 1 vs. choice 2, vs choice 3, not an overall binary yes/no choice. It’s apparent when plotting the conditional effects with `conditional_effects(model_name, categorical = TRUE)`:

There’s no substantial difference in whether people chose the first vs. second vs. third option if it contained cargo = 2ft or cargo = 3ft, which makes sense because those options were all randomized anyway and were selected 33% of the time

I think this goes to @ajnafa 's point that mlogit is doing completely different things under the hood. It picks up on the 33%-per-alternative structure, but then it’s doing some sort of pooling of the coefficients to get the effect of 0/1 across all the alternatives

Makes sense. One option is to add a fourth column that has 1 if no other options are chosen.

Alternatively, see here for a discussion about transforming brms to try match mlogit.

1 Like

It seems like some kind of item response formulation of a hierarchical logit model might work here depending on the goal?

1 Like

Oh cool, that works and shows differences in outcomes across the different covariates. Like here, 3ft has a higher probability of selection than 2ft, if you lump alternatives 1-3 into one category vs. the 0 option. But then at that point it’s just a binary thing and regular logit is fine

Discrete choice models do indeed suck :( 