Mixed Logit Model

Before setting up my own mixed logit model I wanted to ask if such a model already exists. By mixed logit I understand a multi-logit regression (see stan manual page 133) that includes both, alternative specific variables as well as individual variables.
Thanks for any hints,

I am not sure I understand what kind of model you mean (I don’t know what you mean by “alternative specific” and “individual” variables). I would however definitely check if your use case isn’t covered by brms. At the very least, you could build a model as close to your aims with brms and then use the code generated by brms as a starting point.

There seems to be a lot of confusion about terminology, at least for me. By mixed logit I refer to what others call a general model that combines features of a multinomial logit and a conditional logit. A formulation that may help to clarify what I mean is given here (see section 6.3.4).

@martinmodrak I certainly will look into brms.

Hi Tom,

Yep – very straightforward. Here’s a gist that describes implementing with only choice-level variables that vary across choices

If you want to include variables that vary by the individual, then you simply need to include on line 37 those variables multiplied through by their coefficient matrix, as with multinomial logit.

Here’s a slide-deck that might help.



@James_Savage Thanks for the hint. Is there toy data to play around with the model? I’m asking because it is difficult to follow how you subset your data in line 37.

Hi Tom,

Here are a couple of worked examples for you.


If you don’t want to include individual-level parameters on the choice-level attributes you can basically delete the second half of the definition of beta_individual (make sure to remove the tau and L_Omega parameters in that case).

Note I strongly recommend the second version of the model, as it’s more flexible, both in terms of the types of choice patterns it can fit but also the fact that in each task/market/comparison set you can have different possible choices characterised only by their attributes and still allow demographics to influence decision probabilities.

Hi James


I had a hard time getting my head around your indexing part. So I tried it with a different style of indexing. If I’m not mistaken, it includes individual data (demographics) as well as attributes of the alternatives. My model looks like this:

It seems to me that this model is much more sparse… In other words what am I missing here?

Hi Tom -

First, this model is not identified. As you have it, individual attributes affect the utility of all choices to the same degree (ie simply result in a shift in utility for all choices that does not change their rank ordering). Your beta needs to be a matrix (probably with the P-1 parameterization, which can be achieved by setting the last row as 0).

Second, it assumes that all individuals the same marginal utilities, and so you’ll get the red bus/blue bus problem. That mightn’t be a concern for you.

Third, with logits, normal(0, 5) is an enormously wide prior.

Hope this helps!

Hi James,

This is great stuff and potentially very useful in a variety of contexts. One thing that social scientists often encounter with their data is repeated observations of the same individuals. To borrow from your beer example (in the slides), imagine that you have data on the age and salary of the individuals consuming the beer. And then imagine that you see their respective beer choices on approximately five separate occasions, perhaps each characterized by time-varying covariates (amount of stress reported by beer drinker i before making the choice).

What are the chances of expanding the example to account for these variables (presumably with random effects for the repeated observations of individuals)?

Hey Jeremy –

That’s precisely the sort of model I fit everyday! See the second example here, which says that individual characteristics -> preferences over choice attributes, which when combined with choice attributes give us choice probabilities.


Oh cool. One slight hurdle, though. I started working through your simulated data script and hit an error in the creation of the indexes data frame:

Error in data_frame(individual = rep(1:I, each = K * 10), task = rep(1:T, :
could not find function “%>%”

Am I overlooking a needed package? Or is there something else about the syntax that I’m overlooking?

Ah – you’ll need to install.packages("dplyr") and library(dplyr) before you run the script.

Got it. Thanks.

If I’m following along correctly, the attributes of the products vary across tasks (that is, there are time-varying covariates for the products). But the X2 variables for the 50 individuals are assumed to be time-invariant, correct?

How straightforward is to add time-varying covariates for the individuals – perhaps measures of emotional states or variation in monetary endowments at time t or something like that?

Hey Jeremy,

Yep – difficult, but in theory at least possible. I guess you could model each individual-time pair, and presumably have an individual-level hyperprior.

Something like

beta_it ~ normal(beta_i + Gamma * Z_it’, sigma)
beta_i ~ normal(beta, sigma_2)

I’m pretty sure it would work, but might take some time!

In their 1989 book, McCullagh and Nelder note the equivalence of conditional multinomial models and Poisson models. Prior to seeing your post, I was looking into the possibility of using a Poisson parameterization. The downside seems to be the need for binary dummy variables corresponding to the choices (equal to the number of people making the choices, that is, which becomes unwieldy with large sample sizes).

Your comment about the challenges of adding time-varying individual-level predictors makes me wonder if the Poisson might have some advantages . . . even though my hunch is that your approach remains preferable. I was also looking into multilevel conditional logit models using gllamm in Stata.

Meanwhile, my initial read on the individual-time effect is that it has the advantage of conservatism, almost like a random coefficient (or random slopes) model. But different disciplines might vary in terms of its necessity.

Hi Jeremy,

It really depends what you are modeling. In Stan at least, I’ve found the Poisson implementation to be a fair bit slower than the multinomial likelihood, but that might just be my coding of it. But that’s not the model–the model is one of individual choice. And you want to choose the model that makes best sense of your data. In most discrete choice problems this will be conditional logit with parameters varying at the level of the individual and the possibility of unobserved demand shocks that correlate with your choice attributes. If you only have aggregate counts data, you can still fit the model as illustrated here:


Cool. And the translation of terminology across disciplines is helpful.


1 Like

Hi Tom,
Once the X matrix is specified correctly, estimating models with alternative- and individual-specific variables is easy. One simple solution that is often applied in the econometrics literature is to use one alternative as a reference (e.g., j = 1). That means, you build for each individual-specific variable J -1 alternative-specific versions (e.g., income_2, …,income_J) that are zero for each alternative other than j.

I’ve added an example using R and the TravelMode data from the AER package. As a comparison a used mlogit. mFormula() makes it very easy to specify a formula with both kinds of variables and then you can use model.matrix() to create X. I hope using R for this example is ok for you. The Stan program is included in the R file and works for multinomial, conditional, and “mixed” logit models, as long as you specific X correctly. Please also have a look at the vignette from mlogit regarding setting up X.


PS: I do not like the term “mixed logit” for this type of model because this is also (maybe more often) used for multinomial logit model with individual-specific parameters. However, my econometrics Prof. also always used it ambiguously… very confusing.

PPS: I’m not saying my prior choice is the best, the stan program is perfect, or assuming homogenous preferences is a good idea… just tried to keep the example as simple as possible!



Hi Daniel

Thanks for sharing this. If I understood you correctly, it is indeed very easy to estimate such a model. I guess your approach has the same effect of what @James_Savage does by setting “outside options” to zero (I’m still working on implementing his approach, though). I adapted your example to my data and used R as well (using mlogit.data, mFormula and model.matrix). Additionally, I’ve added a vector, that sets alternatives that are not viable to individual i to zero.

PS: The terminology is indeed confusing.