Symmetric weakly informative priors for categorical predictors

luke · April 29, 2021, 1:25am

Suppose I have a model with multiple categorical predictors, e.g.
response ~ ethnicity + religion
What’s the recommendation for setting up a weakly informative prior in this situation?

It wouldn’t make sense to use independent priors (e.g. Normal(0,1)) on all coefficients here, because then the prior predictive is asymmetric (the left out category has much less variance!)… So instead I’m usually inclined to just rewrite the model as a something like:
response ~ (1 | ethnicity) + (1 | religion)
…and set the σ prior to a constant. However:
(a) This feels inefficient, since it’s forcing stan to infer something I don’t care about – the μ for a population of unobserved ethnicities (/religions).
(b) The coefficients are no longer interpretable as I’d like them to be – for example, the posterior on the intercept (and the random effects) will continue to have uncertainty even in the infinite data limit.

It feels that it should just be possible to write the model without random effects at all, and using e.g. mean-centered predictors, but then just add some negative correlation into the prior to make the implied prior-predictive distribution the same for the left-out ethnicity (/religion) as for the others. Does anybody have any good tricks for this situation?

jsocolar · April 29, 2021, 2:48am

One option is to use effects coding rather than dummy coding for the categorical predictors. For a binary predictor, effects coding means using -1/1 instead of 0/1 (which is dummy coding). For a categorical predictor, the dummy coding expands to a column of zeros and ones for every category except the reference category. Replace the zeros with -1’s and you have effects coding.

I’m reasonably certain that Agresti mentions this solution somewhere in his book “Categorical data analysis”.

Not mentioned to my knowledge by Agresti is a knock-on benefit of effects coding as opposed to dummy coding: you get the benefits of @andrewgelman’s recommendation to standardize continuous covariates by dividing by two standard deviations by just doing the “usual” thing of dividing by one standard deviation.

andrewgelman · April 29, 2021, 2:58am

Hi, yes, I discussed that point in this post from 2010: https://statmodeling.stat.columbia.edu/2010/04/12/a_question_abou_9/

luke · April 29, 2021, 1:18pm

Cool thanks, haven’t come across that before – I’ll check out the book!

But just to be clear, using independent & identical priors for a variable coded this way would still imply an asymmetric prior predictive. E.g. If the coding was:

g  e1  e2  e3
1:  1   0   0
2:  0   1   0
3:  0   0   1
4: -1  -1  -1

with a N(0,1) prior on the regression coefficients for e1, e2, e3, then the prior predictive when g=4 will have 3x more variance than when g=1,2,3, right?
(Whereas if the code for g=4 was (-1/√3, -1/√3, -1/√3) then I guess things would be balanced, but the parameters have a weird interpretation…)

jsocolar · April 29, 2021, 1:32pm

Replace all the zeros with -1’s

luke · April 29, 2021, 1:34pm

😮Genius! Exactly the kind of neat trick I was hoping for, thanks.

(I was confused because the first hit for google on effects coding is has it this way)

jsocolar · April 29, 2021, 1:40pm

Yikes, maybe I’ve been using the term “effects coding” wrong for quite a while…

Topic		Replies	Views
Symmetric effects prior for a hierarchical model Modeling hierarchical-model	8	482	February 23, 2023
Preparing a dichotomous predictor for a prior brms	6	790	December 8, 2020
Using categorical predictors Modeling	6	1627	June 19, 2020
Selecting priors for a binary categorical predictor in test of proportions in brms Modeling prior-choice , posterior-predictive , priors , binomial-response , brms	6	2256	March 22, 2021
Variable selection for an exploratory multilevel categorical model with weak priors Modeling techniques , loo	2	581	January 14, 2022

Symmetric weakly informative priors for categorical predictors

Related topics