Categorical factor coding in stan

Canaan_Breiss · May 27, 2020, 12:42am

Hi all,

I’m building a stan model by hand and would like to incorporate a four-level unordered factor as a predictor in a regression component. Right now my instinct is to just have three distinct binary factors (leaving the intercept as the fourth) but I wasn’t sure if there was a more efficient / supported way to do this.

Best,
Canaan Breiss

mike-lawrence · May 27, 2020, 12:08pm

Turning a categorical variable into a set of numeric predictors is called “contrast coding”, and the topic you’re asking might be searchable via “choosing contrast coding” or “contrast coding choice”. Absent any priors, all contrast choices are mathematically identical, but in the context of Bayesian computing, some contrast coding choices are easier to think about than others (and some may change the geometry of the paramter space for easier sampling, but that’s somewhat hard to predict). For example, with a two-level variable, I like to use half-sum contrasts where the first contrast is an intercept (mean-across-conditions, something I usually have decent prior on but don’t really care much about inferentially) and the second contrast is the difference between conditions (something I usually have less prior information on but do care quite a bit about inferentially). With more than 2 conditions there are a variety of contrast options. Take a look here and really just go with whatever helps you create the most meaningful-to-you priors.

Stephen_Martin · May 27, 2020, 9:31pm

That’s how you do it. If you’re using rstan, (or, just using R in general to prep the data), you can specify which contrasts your factor variable should use (sum, treatment, helmert, others are available; sum and ‘treatment’ [aka, dummy-coding] are the most common). E.g., contrasts(mydata$myfactor) <- contr.sum(4); then you can use model.matrix(outcome ~ myfactor, mydata) to create a design matrix. You can then feed that design matrix into stan. I find this to be the easiest approach to constructing what you need.

Canaan_Breiss · May 28, 2020, 5:05pm

I see - thanks a lot for the help, folks.

skan · October 7, 2022, 12:16pm

Hello.

Don’t we need to marginalize the categorical variables?
(and use target += … on the model section)

skan · April 23, 2023, 12:19am

Do we need to marginalize the dummy variables?

mike-lawrence · April 23, 2023, 12:23am

No, because as posed above the model seeks inference on the influence of a categorical variable, which means we want a continuous parameter.

Topic		Replies	Views
Contrast coding for categorical outcome variable Modeling specification	1	544	February 22, 2020
Using categorical predictors Modeling	6	1617	June 19, 2020
Mix of continuous and categorical predictors Modeling	7	3443	April 21, 2018
How can I get categorical variables to not bake into the intercept and set up priors for all levels? Modeling prior-choice , brms	1	588	November 4, 2021
Representing categorical variables in Stan Modeling	4	990	March 8, 2023

Categorical factor coding in stan

Related topics