Modelling Discrete Choices as non-linear formulae in brms


I’ve been playing with brms for a couple of weeks now. It all started when I tried fitting a mixed-effects linear model, and I must say, I’m pretty impressed with how easy and efficient the package is to use. While I’ve got some theoretical knowledge about Bayesian estimation, I’ve never found a package that’s so practical for fitting models like this one. Ever since, given my background in fitting models using frequentist methods in R and GAUSS, I’ve been asking myself a lot of “what if” questions, like, “What if I tried Bayesian estimation for that particular model I estimated in GAUSS? Can brms handle it?”

My main expertise is in discrete choice models, or what some might call conjoint analysis. And after discovering this package, I’ve been curious about how to implement brms for this kind of modelling, and I stumbled upon a really helpful guide by Andrew Heiss here. He compares conjoint analysis using frequentist methods with the “mlogit” model and Bayesian methods with brms. His guide focuses on making conjoint analysis robust and compatible with existing structures, which I found really useful.

Inspired by Andrew’s work, I decided to create my own simulated choice data and play around with discrete choice modelling using brms. Here’s a quick peek at a small sample of the data I’ve been working with: Below is a sample of six rows illustrating travel time, travel cost, and mode choices for users considering three different modes of transportation (car, bus, rail).

  X timecar costcar timebus costbus timerail costrail female business income choice choicecar choicebus choicerail
1 1     310      40     370      25      140       55      0        0  46705      3         0         0          1
2 2     345      40     345      30      130       55      1        0  50123      2         0         1          0
3 3     310      40     370      25      130       35      1        0  67589      3         0         0          1
4 4     310      40     420      15      140       65      0        0  42987      3         0         0          1
5 5     310      40     300      30      170       35      0        1  24766      3         0         0          1
6 6     310      40     330      35      120       65      1        0  16554      2         0         1          0

I went ahead and experimented with my dataset using both the “categorical” and “multinomial” families. I had my fair share of challenges and experiences, much like what Andrew Heiss described in this post. And yes, as Andrew mentioned in that thread: “Discrete choice models do indeed suck :(”. It’s true that compared to linear models, they’re much trickier to work with in software. Additionally, given the constraints and requirements of discrete choice theory and random utility (such as fixing one of the ASCs, defining availability indicators for certain alternatives, and relaxing the IIA through nestings), there isn’t much flexibility when it comes to playing with parameters in pre-defined links in brms.

But then I had an idea. What if I treated the discrete choice model, along with all the specifications I wanted to consider, as a non-linear model in brms? After all, discrete choice models are non-linear, and brms allows you to input your own non-linear formulas.

I apologize for the lengthy explanation, but I wanted to give you some context on where I’m at. My goal is to model any discrete choice model in brms as a non-linear formula. To start off simple, I attempted to write my multinomial logit model for the dataset mentioned above. I understand that it’s possible to fit a categorical family in my dataset and achieve the model I’m aiming for, but I wanted to learn how to do it for a basic multinomial logit model first, and then expand to more complex models like nested logit and cross-nested logit.

For now, the model I am trying to do is a very simple multinomial logit (MNL):

The utility for each travel mode is determined by the equation (intercept + parameter * travel_time). In accordance with random utility theory assumptions, I set the intercept for the car mode to zero, making it a reference value for the other intercepts in the model. All population effects, nothing fancy yet. Below is the code for the model:

fit1 <- brm(
                choicecar ~ 1-choicebus-choicerail, 
                nlf(choicebus ~ nbus/denom),
                nlf(choicerail ~ nrail/denom),
                nlf(denom ~ ncar + nbus + nrail), #denominator in the MNL model
                nlf(ncar ~ exp(0 + b1*timecar)),  #Fixing car's ASC to zero as the reference
                nlf(nbus ~ exp(asc1 + b1*timebus)),
                nlf(nrail ~ exp(asc2 + b1*timerail)),
                asc1+asc2+b1 ~ 1,  # This line declares the parameters to brms
                nl = TRUE
              data = database,
              family = bernoulli("identity"),
              prior = c(
                prior(normal(0, 4), nlpar = "b1"),
                prior(normal(0, 4), nlpar = "asc1"),
                prior(normal(0, 4), nlpar = "asc2")
              cores = 8,
              chains = 8,
              warmup = 1000,
              iter = 2000,
              control = list(adapt_delta = 0.9)

The code runs fairly quickly without any warnings and here is a sample outcome:

 Family: bernoulli 
  Links: mu = identity 
Formula: choicecar ~ 1 - choicebus - choicerail 
         choicebus ~ nbus/denom
         choicerail ~ nrail/denom
         denom ~ ncar + nbus + nrail
         ncar ~ exp(0 + b1 * timecar)
         nbus ~ exp(asc1 + b1 * timebus)
         nrail ~ exp(asc2 + b1 * timerail)
         asc1 ~ 1
         asc2 ~ 1
         b1 ~ 1
   Data: database (Number of observations: 500) 
  Draws: 8 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 8000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
asc1_Intercept    -1.63      2.43    -8.35     0.46 1.00     1383     1376
asc2_Intercept    -1.33      2.41    -7.88     0.95 1.00     1482     1554
b1_Intercept       0.00      0.00    -0.00     0.00 1.00     3251     2947

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Everything was going smoothly until I attempted to validate this outcome with a frequentist fit I obtained from the “Apollo” package, which is designed specifically for various types of discrete choice models. Since my model is straightforward and there are no complex elements involved, Apollo used a simple maximum likelihood estimation (without any numerical calculations for simplification) with convergence in the local optimum. Below is the estimate from Apollo after fixing the intercept for the car mode to zero (ASC stands for alternative-specific constant):

            Estimate        s.e.   t.rat.(0)    Rob.s.e. Rob.t.rat.(0)
asc_car      0.00000          NA          NA          NA            NA
asc_bus     -0.82490    0.162207    -5.08544    0.163040      -5.05947
asc_rail    -0.01027    0.305006    -0.03367    0.310409      -0.03308
b_tt      6.0998e-04    0.001690     0.36085    0.001729       0.35272

And finally, here is my question:

I understand that Bayesian and frequentist estimations are not expected to be identical, but should they not generally fall within a similar range?

I noticed on Andrew Heiss’s website that he was able to achieve roughly similar outcomes from both mlogit and brms using the existing links. Then this makes me suspect that there may be a silly mistake in my formulation, or perhaps in the way I am interfacing with brms. This seems plausible, especially considering that I have only been experimenting with it for a couple of weeks.

It appears straightforward, yet I have been stuck with this issue for a few days now. Any insights, suggestions, or assistance would be greatly appreciated :)

Once again, sorry for the long message :)

The way you have defined the formula, choicebus and choicerail are parameters, and so the data for those variables is not used in estimating the model. You can see that by running your code with standata() instead of brm:

[1] 6

[1] 0 0 0 0 0 0

[1] 1

1         1
2         1
3         1
4         1
5         1
6         1
[1] 0

[1] 1

1         1
2         1
3         1
4         1
5         1
6         1
[1] 0

[1] 1

1         1
2         1
3         1
4         1
5         1
6         1
[1] 0

[1] 310 345 310 310 310 310

[1] 370 345 370 420 300 330

[1] 140 130 130 140 170 120

[1] 0

[1] "standata" "list" 

To do what you want you would need to use a multivariate formula, but then it gets tricky - you cannot specify shared non-linear parameters. There might other ways to do that which I am not aware of

1 Like

I see!

I used to think brms does not consider parameters unless I identify them as the parameters. But you are right, this shows they are merely treated as a transition parameter instead of estimating them using the data.

This justifies what is happening. I will try to see if there is any trick I can do to get around this in brms.

But, thanks a lot, this answers the questions I had.

1 Like

For those who might read this in the future…

After Ven’s comment, I take a look into how multivariate modelling works in brms. I tried to apply the syntax that was introduced here. The model runs but the issue is that for each equation the parameters are estimated separately. For example, in my case, I got replicated ASCs:


Then I tried to explore options to get around this issue and make some parameters shared where I saw this post where someone was trying to implement shared parameters in non-linear multivariate equations. But sadly, it is not possible in the current version.

However, the feature is expected in brms 3.0. Hopefully, I can revisit this quest then :)