Trouble fitting an unpooled logistic regression - How to handle reference categories?

jphill01 · September 11, 2023, 4:36pm

Here: Representing categorical variables in Stan, I asked about coding categorical variables in Stan.

However, how should I go about coding reference levels for my factors?

Specifically, when using R’s glm(), assigning reference levels is handled automatically.

For Stan, do I first need to coerce categories to factors via as.factor() then convert the factors to integers using as.integer()? Doing this however keeps all levels, instead of using one as the reference.

Any ideas?

mpdylan · September 11, 2023, 7:01pm

The index variable approach you’re using in the original thread is inherently agnostic about which group is the reference group, so it doesn’t lend itself as naturally to having a reference level as the dummy variable/one-hot encoded approach also described in the linked StackOverflow thread.

However, if it’s important to you to have, say, the first level in each grouping factor be a reference level, and you want to keep the index variable structure, perhaps you could code something in the generated quantities block to compute alpha[i] - alpha[1] for each i > 1 so that your model would output differences from the first level in addition to the raw intercepts.

jphill01 · September 13, 2023, 4:16am

Thanks @mpdylan! This sounds like a good approach.

jsocolar · September 13, 2023, 1:03pm

Just a note. You say:

but the model that you are working with contains random effects and cannot be fit by glm. This might be a source of some of your confusion, because the need for a reference level is a feature of the fixed effects specification. In a model such as yours, where a factor variable enters only as a grouping factor for random effects, the intercept corresponds to the grand mean (i.e. the average city) rather than to any particular reference city.

Topic		Replies	Views
Representing categorical variables in Stan Modeling	4	1024	March 8, 2023
Reference level of Y for multinomial regression General	1	380	May 5, 2022
Categorical factor coding in stan Modeling techniques , specification	6	5555	April 23, 2023
How to specify the coefficient for reference group as zero in stan Modeling	1	372	August 4, 2023
Using Category Embeddings in Linear Regression Model Modeling	1	841	December 21, 2019

Trouble fitting an unpooled logistic regression - How to handle reference categories?

Related topics