Use categorical /multinomial family with binary-ish data based on separate column in brms

{brms} supports multinomial models with the categorical() family, and it works well as long as the response variable has 3+ categories in it.

The {mlogit} R package allows you to fit multinomial logitistic models in a frequentist way, and if you feed it data that is structured in a specific way, it can fit a multinomial model using a binary 0/1 outcome

Here’s an example (from this earlier related question)

library(dplyr)
library(brms)
library(mlogit)

# Read in data
cbc_df <- read.csv("http://goo.gl/5xQObB",
  colClasses = c(seat = "factor", price = "factor", choice="integer")) %>% 
  mutate(eng = factor(eng, levels = c("gas", "hyb", "elec")))

Here, resp.id refers to 200 different respondents, ques referrs to 15 different random sets of questions offered to each respondent, alt refers to the different options in each set of questions. The choice column indicates which alt was selected. The other columns are different characteristics of the options being presented.

head(cbc_df)
#>   resp.id ques alt carpool seat cargo  eng price choice
#> 1       1    1   1     yes    6   2ft  gas    35      0
#> 2       1    1   2     yes    8   3ft  hyb    30      0
#> 3       1    1   3     yes    6   3ft  gas    30      1
#> 4       1    2   1     yes    6   2ft  gas    30      0
#> 5       1    2   2     yes    7   3ft  gas    35      1
#> 6       1    2   3     yes    6   2ft elec    35      0

With mlogit(), you use the binary choice column as the response variable, but because of some adjustments to the dataset with dfidx(), mlogit() instead models the choice of alt = 1, alt = 2, or alt = 3 within each question. You can tell it’s doing multinomial work because it finds that alts 1, 2, and 3 were selected 33% of the time (to be expected, since the options were all shuffled every time)

# Modify the data for mlogit()
cbc_df_idx <- cbc_df %>% 
  # mlogit() needs a column with unique question id numbers
  group_by(resp.id, ques) %>% 
  mutate(choice.id = cur_group_id()) %>% 
  ungroup() %>% 
  # Make indexed data frame for mlogit
  dfidx(
    idx = list(c("choice.id", "resp.id"), "alt"),
    choice = "choice",
    shape = "long"
  )

m1 <- mlogit(choice ~ 0 + seat + cargo + eng + price, data = cbc_df_idx)
summary(m1)
#> Call:
#> mlogit(formula = choice ~ 0 + seat + cargo + eng + price, data = #> cbc_df_idx, 
#>     method = "nr")
#> 
#> Frequencies of alternatives:choice
#>       1       2       3 
#> 0.32700 0.33467 0.33833 
#> 
#> ...
#> 
#> Coefficients :
#>           Estimate Std. Error  z-value  Pr(>|z|)    
#> seat7    -0.535280   0.062360  -8.5837 < 2.2e-16 ***
#> seat8    -0.305840   0.061129  -5.0032 5.638e-07 ***
#> cargo3ft  0.477449   0.050888   9.3824 < 2.2e-16 ***
#> enghyb   -0.811282   0.060130 -13.4921 < 2.2e-16 ***
#> engelec  -1.530762   0.067456 -22.6926 < 2.2e-16 ***
#> price35  -0.913656   0.060601 -15.0765 < 2.2e-16 ***
#> price40  -1.725851   0.069631 -24.7856 < 2.2e-16 ***
#> ...

Translating this to {brms} seems like it should be relatively straightforward (again referencing this earlier question):

m2 <- brm(
  bf(choice ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
  data = cbc_df,
  family = categorical(),
  prior = c(
    prior(normal(0, 3), class = b, dpar = mu1),
    prior(exponential(1), class = sd, dpar = mu1)
  ),
  chains = 4, cores = 4, iter = 2000, seed = 1234,
  backend = "cmdstanr", threads = threading(2)
)

The model fits just fine and the coefficients are all roughly comparable to the output from mlogit(), but I get this message:

Only 2 levels detected so that family 'bernoulli' might be a more efficient choice.

Which is true. The choice column here only contains 0/1 valies and it’s not technicaly multinomial.

Is there some way to get {brms} to somehow treat the response variable as multinomial with alternatives 1, 2, 3, similar to {mlogit}?

I’ve tried a few possible methods for this:

  • Use the alt column as the response, like bf(alt ~ 0 + seat + ...). But this isn’t right because it doesn’t include anything about which alternative was actually chosen
  • Create a new column called choice_alt that is alt * choice - if someone selected alternative 3, this value would be 3; if someone selected alternative 2, this value would be 2; and so on. The value is 0 if no choice was made. This works-ish, but it creates four categories (0, 1, 2, and 3) instead of 3
  • Somehow pass a three-column matrix of dummy-encoded columns to brms? Like bf(choice_matrix ~ 0 + seat + ...). That doesn’t work, though, because brms isn’t designed for that kind of input data
  • Somehow build the formula using nonlinear syntax (via this GitHub issue). This is probably the correct approach though how to implement it eludes me.
  • Somehow use multivariate models and subsets? I tried this, but got an error about “Cannot use the same response variable twice in the same model”
option1 <- bf(choice | subset(alt == 1) ~ 0 + seat + cargo + eng + price + (1 | resp.id))
option2 <- bf(choice | subset(alt == 2) ~ 0 + seat + cargo + eng + price + (1 | resp.id))
option3 <- bf(choice | subset(alt == 3) ~ 0 + seat + cargo + eng + price + (1 | resp.id))

brm(
  option1 + option2 + option3,
  data = cbc_df,
  family = categorical(),
  ...
)

Thanks!

The categorical family in brms assumes the response is a univariate vector, so if you are passing in data in the form of a matrix, you probably want to use the multinomial family instead. That said, it seems like what you want is some kind of nested logit model rather than a multinomial count so the non-linear syntax is probably the way to go, but it might also be worth looking into the logit normal family that can handle multivariate binary responses.

As an aside, I ran into the issue about a year ago and if I recall, comparing mlogit to brms is a really bad idea because they are doing fundamentally different things in the backend (as in different models entirely, not just one being the bayesian variant).

1 Like

Somehow pass a three-column matrix of dummy-encoded columns to brms? Like bf(choice_matrix ~ 0 + seat + ...). That doesn’t work, though, because brms isn’t designed for that kind of input data

This should work if you use family = multinomial(). You would need an outcome matrix that is N × 3. It looks like you only have 1 trial, so you’d have 1 trial per row.

See here for an example of multinomial data.

Hope that helps with this specific problem. Also, discrete choice models suck 😀

1 Like

Ooh this is super promising!

brms doesn’t seem happy about forcing one trial per row, though, in cases where there was no choice

For example,

cbc_df_with_matrix <- cbc_df %>% 
  mutate(choice_alt = alt * choice) %>% 
  mutate(
    alt1 = ifelse(choice_alt == 1, 1, 0),
    alt2 = ifelse(choice_alt == 2, 1, 0),
    alt3 = ifelse(choice_alt == 3, 1, 0),
  ) %>% 
  mutate(outcome = cbind(alt1, alt2, alt3)) %>% 
  mutate(size = 1)
head(cbc_df)
#> # A tibble: 9,000 × 16
#>    resp.id  ques   alt ... choice ... outcome[,"alt1"] [,"alt2"] [,"alt3"]  size
#>      <dbl> <dbl> <dbl> ...  <dbl> ...            <dbl>     <dbl>     <dbl> <dbl>
#>  1       1     1     1 ...      0 ...                0         0         0     1
#>  2       1     1     2 ...      0 ...                0         0         0     1
#>  3       1     1     3 ...      1 ...                0         0         1     1
#>  4       1     2     1 ...      0 ...                0         0         0     1
#>  5       1     2     2 ...      1 ...                0         1         0     1
#>  6       1     2     3 ...      0 ...                0         0         0     1

That creates a matrix column named outcome with three columns in it. Running this model:

brm(
  bf(outcome | trials(size) ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
  data = cbc_df_with_matrix,
  family = multinomial(refcat = "alt1")
)

gives an error: “Error: Number of trials does not match the number of events.” That’s because in lots of those rows, no selection was ever made (like in rows 1 and 2, for instance—there’s 0 for all three outcomes)

Instead of forcing size to be 1, I can add the number of outcomes in each row:

mutate(size = alt1 + alt2 + alt3)

That makes the size either 0 or 1:

#> # A tibble: 9,000 × 16
#>    resp.id  ques   alt ... choice ... outcome[,"alt1"] [,"alt2"] [,"alt3"]  size
#>      <dbl> <dbl> <dbl> ...  <dbl> ...            <dbl>     <dbl>     <dbl> <dbl>
#>  1       1     1     1 ...      0 ...                0         0         0     0
#>  2       1     1     2 ...      0 ...                0         0         0     0
#>  3       1     1     3 ...      1 ...                0         0         1     1
#>  4       1     2     1 ...      0 ...                0         0         0     0
#>  5       1     2     2 ...      1 ...                0         1         0     1
#>  6       1     2     3 ...      0 ...                0         0         0     0

and the model fits!

brm(
  bf(outcome | trials(size) ~ 0 + seat + cargo + eng + price + (1 | resp.id)),
  data = cbc_df_with_matrix,
  family = multinomial(refcat = "alt1")
)

But the coefficients are wildly off from what I get when using family = bernoulli or with mlogit (which are the same). And I think that’s because all the rows with size = 0 are essentially getting dropped?

Yeah, nested logit feels more logical to me—even just using regular logit with a random intercept for questions embedded in respondents (i.e. (1 | resp.id / ques)) seems to work and gives the same results as the marketing textbook example I’m copying,

lol just kidding, the coefficients are wildly off because it’s predicting choice 1 vs. choice 2, vs choice 3, not an overall binary yes/no choice. It’s apparent when plotting the conditional effects with conditional_effects(model_name, categorical = TRUE):

There’s no substantial difference in whether people chose the first vs. second vs. third option if it contained cargo = 2ft or cargo = 3ft, which makes sense because those options were all randomized anyway and were selected 33% of the time

I think this goes to @ajnafa 's point that mlogit is doing completely different things under the hood. It picks up on the 33%-per-alternative structure, but then it’s doing some sort of pooling of the coefficients to get the effect of 0/1 across all the alternatives

Makes sense. One option is to add a fourth column that has 1 if no other options are chosen.

Alternatively, see here for a discussion about transforming brms to try match mlogit.

1 Like

It seems like some kind of item response formulation of a hierarchical logit model might work here depending on the goal?

1 Like

Oh cool, that works and shows differences in outcomes across the different covariates. Like here, 3ft has a higher probability of selection than 2ft, if you lump alternatives 1-3 into one category vs. the 0 option. But then at that point it’s just a binary thing and regular logit is fine

Discrete choice models do indeed suck :(

image