Ditching Sawtooth + better understanding of Bayesian models

Newbie alert

Firstly, I appreciate all the thoughtful posts and replies on this forum. Though I have 5 years of industry experience as a Data Scientist for a market research firm, my exposure to Bayesian modeling is minimal in practice and even more minimal through undergrad and grad school, in short - I’m very much new to this…

The company I work for does a lot of Discrete Choice Modeling (as is typical in market research), and uses Sawtooth to fit these models. We run several Conjoint and MaxDiff (best-worst scaling) studies each year. It’s typical for us to report out the respondent-level utilities, or create simulators to estimate preference shares… again, very typical for MR.

I’m generally not involved in these studies but that is likely to change in the near-term. I have several reasons for wanting to ditch Sawtooth, but the main ones are:

  • Sawtooth is expensive.
  • I can easily run a model in Sawtooth w/out knowing what’s actually going on. I would like to force myself to learn more about these models by using something that takes more thought and effort.
  • I am an R programmer and would love to not have to break my workflow by stepping outside of R.

So what am I asking?

  1. Are there any resources for people with no prior (pun intended) knowledge of Bayesian modeling, who don’t have PhD level mathematical chops, to gain a sufficient understanding of the subject in order to thoughtfully apply it in practice?
  2. Practically speaking, where should I start in terms of learning to use Stan/RStanArm/brms in place of Sawtooth for what they’d call “Hierarchical Bayes” for estimating utilities for Conjoint and MaxDiff studies? A lot of my confusion on this may stem from the wide variety of terminology that exists.

Thanks in advanced!


Both rstanarm and brms do hierarchical Bayesian models, although I don’t know which hierarchical models Sawtooth does (it is the first time I have heard of it).

As far as resources, if you are learning on your own, I would start with a recent video by @richard_mcelreath

and then work through the second edition of his textbook that comes out March 15.

For discrete choice modeling in particular, I would look at @James_Savage’s examples at


and for marketing applications possibly @eleafeit’s paper from the last StanCon


Fun project!

Sawtooth estimates a hierarchical multinomial logit model estimated from individual-level observations of each choice (from the survey). For the modelers on the thread this model is a multinomial logit with consumer-level parameters drawn from a multivariate normal distribution. I believe Sawtooth implements a Gibbs sampler for Bayes estimation.

You can actually fit this model using maximum simulated likelihood using the mlogit() package. You can estimate the same model using bayesian methods with ChoiceModelR or bayesm. ChoiceModerR has the advantage that it works well with Sawtooth’s data formats. (ChoiceModelR may
be a wrapper for bayesm; I’m not sure.) The ideal place for you to come up to speed on alternatives to Sawtooth from a marketing perspective would be with Ch 13 of Chapman and Feit. You are pretty much exactly the user I had in mind when I wrote it. There is also a DataCamp version of this same material, but it isn’t quite as in-depth and it has more coding-practice which you probably don’t need.

AFAIK, brms or RStanArm does not implement this model. If you are okay with writing/using the Stan modeling language directly, then @Kevin_Van_Horn and I created a Stan tutorial that was targeted at folks doing conjoint surveys: https://github.com/ksvanhorn/ART-Forum-2017-Stan-Tutorial I’d read that AFTER you read Ch13 from Chapman and Feit, as it presumes a good working knowledge of conjoint and some basic understanding of Bayesian methods. It was written for someone who is new to Stan and wants to use Stan to replicate Sawtooth.

The nice thing about using Stan’s modeling language is that you can tweek the model directly yourself. I’m currently using Stan to estimate integrated models of consumer choice that incorporate other observables like response time and neural response into the model. Stan makes it super-easy to write a model that fits my data rather than trying to cram my data into someone else’s model.

I believe @James_Savage’s examples are for the same model, but when the data is observed in aggregate for multiple markets. I may be mis-remembering that, so perhaps he will jump in and clarify.

And the fact that there is variation in terminology across disciplines is a gross understatement. Economists would probably call this a “mixed logit”, but marketers tend to say “HB MNL”. Also, when marketers say multinomial logit, others say conditional logit. Basically I have to see a equations or Stan code to understand what any individual package does.


Also, the Test & Roll paper is probably not relavant to you, but it is very cool ;) Thanks to @bgoodri for the shout-out.

1 Like

Thanks for this! @James_Savage 's posts look like they are going to be extremely helpful. And I’ll keep my eye open for the release of @richard_mcelreath 's book.

Wow thanks for all of this, I actually do already own your book with Chapman and was the inspiration for this whole project in the first place, so thanks x 2!

I think the most serious of the critiques I’ve seen levied against Sawtooth from the folks in the Stan community is the almost ubiquitous non-consideration of priors. Sawtooth software makes it pretty easy to ignore this. One thing I want to be mindful of is not making this same error with ChoiceModelR. I actually ran one of my company’s previous studies through ChoiceModelR (with guidance from your book) and got very close results, which is great if one wishes to replicate Sawtooth.

Now, from your perspective, is merely replicating Sawtooth with default priors sufficient?

One other thing I’m not sure of with ChoiceModelR is how to set up the data in order to estimate utilities for best-worst/MaxDiff scaling, and I haven’t come across any resources explaining that.

Regarding Stan, I’m not comfortable using the modeling language yet but I certainly wish to be. The git repository you shared looks extremely helpful, though I’m sure I will have many more questions after I start digging into it.

The priors in Sawtooth are actually pretty good. I’m pretty sure they have implemented the prior proposed in Lenk and Orme 2009. The parameters of that prior can also be user-adjusted in Sawtooth, even though the UI doesn’t really lead users to think about doing that. Of course, it doesn’t have all the flexibility to change the parametric form of the prior that you would have in Stan, but unless you are dealing with really insufficient data, then the parameter estimates shouldn’t be too different with different priors.

I don’t know of a good resources on setting up the data for a max-diff. You may actually have to adjust the code for the “worst” observation. Or maybe you can fake it by negating all the attributes? It isn’t something I do regularly.

I encourage you to work through the code in the tutorial with @Kevin_Van_Horn. It will bring you right up to the model in ChoiceModelR and Sawtooth (with slightly different priors, of course), teaching you the Stan syntax along the way. Seriously, we had someone just like you in mind when we wrote it.

Also, you should consider coming to the Advanced Research Techniques forum at Rochester in June. There will be a handful of choice modelers there including me. @Kevin_Van_Horn and I wrote the tutorial to present at that conference.


Okay, good to know about the priors.

On max-diff, “faking” the attributes via negation was what I had in mind as well, will have to play around with that for a while. Will post a solution somewhere once I land on one.

I’ve forked that repository and will be working through it over the next few days, I’m really excited to start!

That conference looks great, I will put a bug in my boss’s ear and see if that’s something they’d be willing to pay for ;)

Great! Don’t hesitate to email me or message here, if you get stuck.


Glad you found the book helpful! A couple of notes:

(1) on priors, there have been various papers over the years at the Sawtooth Software Conference looking at the effect of various priors related to CBC models. As my own 0.02 summary of that, I’d say (A) they have largely incorporated best practices to the extent that is feasible for defaults; (B) in typical CBC data sets, the data tend to swamp the priors within some large range of possible priors; and © as eleafeit notes, there are various customizations. Sometimes folks do go outside the models that Sawtooth implements, which requires custom code as noted. A general reference would be the various “Proceedings of the Sawtooth Software Conference”, freely available as PDFs and search for “priors”.

(2) for estimating MaxDiff using ChoiceModelR, I have experimental code available for that in the github “cnchapman/choicetools” package, in particular the “maxdiff*” functions here: https://github.com/cnchapman/choicetools/tree/master/R . There is a function that can import from Sawtooth (in “CHO” format) and another to estimate the models. I’ll repeat the “experimental” part :)


(1) Thank you for the clarification on this, much appreciated.

(2) Funny you should mention this! My colleague and I had spent a good chunk of yesterday perusing your source code for MaxDiff related functions. The data we receive varies depending on the survey programmer who hands us the data, so we can’t reliably know what shape it will take.

I’ll noodle around in your experimental code some more and see if I can’t put some pieces together.

The choicetools package looks very cool, don’t hesitate to reach out if you need a guinnea pig for testing things out, you can message me on this site or reach out on GitHub, my profile is here.

Thanks again!

1 Like

Another fan of Statistical Rethinking and Richard McElreath here. I’ve also read through Doing Bayesian Data Analysis by John Kruschke. Both are quite understandable with basic knowledge of probability theory and a bit of calculus. Bayesian Data Analysis by Gelman et al. is solid general reference.


Hi all,

I’m trying to implement a HB MNL for maxdiff analysis as discussed here but I’m having difficulty understanding exactly how the model is formulated and how the data is structured for the modelling.

I’m using python, and unfortunately, not familiar with R. I’d like to code this up in Stan, and consequently, I’d prefer to understand the model structure directly rather than wading through R packages.

Some questions that I have are:

  1. Is a standard version of this model written out as formulae somewhere?

  2. @eleafeit mentions that the respondent utilities are parameterized via a multi-variate normal. What is the dimension of the MVN? n_respondents? n_statements?

  3. I’m having difficulty understanding how to setup the training data and the vector on which the model will be conditioned. The only reference I’ve found is this:


Is the method described here valid? Is there another better method.

Thanks much!

Following up to my questions above, after some search I found a wonderful introductory tutorial from @eleafeit 's blog:

many thanks to @eleafeit!!


Nice to meet you @aranboldt!

In addition to that blog, you might also like the longer tutorial that Kevin Van Horn and I wrote. It’s linked above in this thread, but here it is again: GitHub - ksvanhorn/ART-Forum-2017-Stan-Tutorial: Materials from tutorial "Using Stan to Estimate Hierarchical Bayes Models," ART Forum 2017 The tutorial was written for an audience that knows conjoint pretty well already, so it might not be ideal for a beginner, but all the code is there for the models that are popular in the conjoin world.

To answer your specific questions:

  1. A good resource on these models is Train’s book “Discrete Choice with Simulation” He has everything written out in math notation.
  2. The MVN has dimension equal to the number of features of the choice alternatives
  3. Data setup is always a hassle as there is no standardized format for the data. For Stan, the best resources is the tutorial that Kevin and I wrote. For mlogit (an MLE package for these models in R), I’d recommend Ch 13 of my book with Chris Chapman.

Hello. I’d like to offer my 2 cents on this issue.

It is great to use Stan to model the conjoint outcome data, after the data have been collected. I follow the tutorial from Prof. Feit and modify the codes for my need.

However, it is my understanding Sawtooth uses adaptive method in data collection with a web-based survey interface. That is also very powerful and you may or may not wish to keep Sawtooth for that purpose. Good luck~

Hello @eleafeit,

I have been using your tutorial from the 2017 ART forum and reading your book with Chris Chapman. They are both great! Thank you! But I’m stuck with one little parameter on the stan model for a hierarchical multinomial logit model (hnml.stan on github). It might just be that I am confusing terms, but I haven’t been able to figure it out:

int<lower=1> K; // # of covariates of alternatives 

What does covariates of alternatives mean? In the example you have of chocolate data it is K=9, are this those 9 covariates?

  1. Price
  2. BrandDove
  3. BrandGhirardelli
  4. BrandHersheys
  5. BrandGodiva
  6. TypeDark
  7. TypeDarkNuts
  8. TypeMilk
  9. TypeMilkNuts

Thanks again!


Sorry for the slow response. I took a few days off last week and am just not catching up with my inbox.

Yes, “covariates of alternatives” means the features of the alternatives that people choose from. You are right that the covariates are price and the dummies for brand and type. Let me know if you have other questions!