Hi!
I’ve been playing with brms for a couple of weeks now. It all started when I tried fitting a mixed-effects linear model, and I must say, I’m pretty impressed with how easy and efficient the package is to use. While I’ve got some theoretical knowledge about Bayesian estimation, I’ve never found a package that’s so practical for fitting models like this one. Ever since, given my background in fitting models using frequentist methods in R and GAUSS, I’ve been asking myself a lot of “what if” questions, like, “What if I tried Bayesian estimation for that particular model I estimated in GAUSS? Can brms handle it?”
My main expertise is in discrete choice models, or what some might call conjoint analysis. And after discovering this package, I’ve been curious about how to implement brms for this kind of modelling, and I stumbled upon a really helpful guide by Andrew Heiss here. He compares conjoint analysis using frequentist methods with the “mlogit” model and Bayesian methods with brms. His guide focuses on making conjoint analysis robust and compatible with existing structures, which I found really useful.
Inspired by Andrew’s work, I decided to create my own simulated choice data and play around with discrete choice modelling using brms. Here’s a quick peek at a small sample of the data I’ve been working with: Below is a sample of six rows illustrating travel time, travel cost, and mode choices for users considering three different modes of transportation (car, bus, rail).
X timecar costcar timebus costbus timerail costrail female business income choice choicecar choicebus choicerail 1 1 310 40 370 25 140 55 0 0 46705 3 0 0 1 2 2 345 40 345 30 130 55 1 0 50123 2 0 1 0 3 3 310 40 370 25 130 35 1 0 67589 3 0 0 1 4 4 310 40 420 15 140 65 0 0 42987 3 0 0 1 5 5 310 40 300 30 170 35 0 1 24766 3 0 0 1 6 6 310 40 330 35 120 65 1 0 16554 2 0 1 0
I went ahead and experimented with my dataset using both the “categorical” and “multinomial” families. I had my fair share of challenges and experiences, much like what Andrew Heiss described in this post. And yes, as Andrew mentioned in that thread: “Discrete choice models do indeed suck :(”. It’s true that compared to linear models, they’re much trickier to work with in software. Additionally, given the constraints and requirements of discrete choice theory and random utility (such as fixing one of the ASCs, defining availability indicators for certain alternatives, and relaxing the IIA through nestings), there isn’t much flexibility when it comes to playing with parameters in pre-defined links in brms.
But then I had an idea. What if I treated the discrete choice model, along with all the specifications I wanted to consider, as a non-linear model in brms? After all, discrete choice models are non-linear, and brms allows you to input your own non-linear formulas.
I apologize for the lengthy explanation, but I wanted to give you some context on where I’m at. My goal is to model any discrete choice model in brms as a non-linear formula. To start off simple, I attempted to write my multinomial logit model for the dataset mentioned above. I understand that it’s possible to fit a categorical family in my dataset and achieve the model I’m aiming for, but I wanted to learn how to do it for a basic multinomial logit model first, and then expand to more complex models like nested logit and cross-nested logit.
For now, the model I am trying to do is a very simple multinomial logit (MNL):
The utility for each travel mode is determined by the equation (intercept + parameter * travel_time). In accordance with random utility theory assumptions, I set the intercept for the car mode to zero, making it a reference value for the other intercepts in the model. All population effects, nothing fancy yet. Below is the code for the model:
fit1 <- brm(
brmsformula(
choicecar ~ 1-choicebus-choicerail,
nlf(choicebus ~ nbus/denom),
nlf(choicerail ~ nrail/denom),
nlf(denom ~ ncar + nbus + nrail), #denominator in the MNL model
nlf(ncar ~ exp(0 + b1*timecar)), #Fixing car's ASC to zero as the reference
nlf(nbus ~ exp(asc1 + b1*timebus)),
nlf(nrail ~ exp(asc2 + b1*timerail)),
asc1+asc2+b1 ~ 1, # This line declares the parameters to brms
nl = TRUE
),
data = database,
family = bernoulli("identity"),
prior = c(
prior(normal(0, 4), nlpar = "b1"),
prior(normal(0, 4), nlpar = "asc1"),
prior(normal(0, 4), nlpar = "asc2")
),
cores = 8,
chains = 8,
warmup = 1000,
iter = 2000,
control = list(adapt_delta = 0.9)
)
The code runs fairly quickly without any warnings and here is a sample outcome:
Family: bernoulli Links: mu = identity Formula: choicecar ~ 1 - choicebus - choicerail choicebus ~ nbus/denom choicerail ~ nrail/denom denom ~ ncar + nbus + nrail ncar ~ exp(0 + b1 * timecar) nbus ~ exp(asc1 + b1 * timebus) nrail ~ exp(asc2 + b1 * timerail) asc1 ~ 1 asc2 ~ 1 b1 ~ 1 Data: database (Number of observations: 500) Draws: 8 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup draws = 8000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS asc1_Intercept -1.63 2.43 -8.35 0.46 1.00 1383 1376 asc2_Intercept -1.33 2.41 -7.88 0.95 1.00 1482 1554 b1_Intercept 0.00 0.00 -0.00 0.00 1.00 3251 2947 Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Everything was going smoothly until I attempted to validate this outcome with a frequentist fit I obtained from the “Apollo” package, which is designed specifically for various types of discrete choice models. Since my model is straightforward and there are no complex elements involved, Apollo used a simple maximum likelihood estimation (without any numerical calculations for simplification) with convergence in the local optimum. Below is the estimate from Apollo after fixing the intercept for the car mode to zero (ASC stands for alternative-specific constant):
Estimates: Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0) asc_car 0.00000 NA NA NA NA asc_bus -0.82490 0.162207 -5.08544 0.163040 -5.05947 asc_rail -0.01027 0.305006 -0.03367 0.310409 -0.03308 b_tt 6.0998e-04 0.001690 0.36085 0.001729 0.35272
And finally, here is my question:
I understand that Bayesian and frequentist estimations are not expected to be identical, but should they not generally fall within a similar range?
I noticed on Andrew Heiss’s website that he was able to achieve roughly similar outcomes from both mlogit and brms using the existing links. Then this makes me suspect that there may be a silly mistake in my formulation, or perhaps in the way I am interfacing with brms. This seems plausible, especially considering that I have only been experimenting with it for a couple of weeks.
It appears straightforward, yet I have been stuck with this issue for a few days now. Any insights, suggestions, or assistance would be greatly appreciated :)
Once again, sorry for the long message :)