How to do empirical Bayes on factor variables?

Particularly given e.g. variable gender. Then is the fit supposed to be done so that it becomes

`intercept + genderMale`

with genderFemale baked into intercept.

Or should I do something else?

Further, how a priors defined on such?

For fully Bayesian inference you can treat this in a completely standard way. For example

bf(y ~ intercept + genderMale)

Then define whatever priors you want on the intercept and the coefficient for genderMale.

One little trick that can be useful in this context is to code gender as -1/1 (instead of 0/1). This ensures that the prior pushfoward distributions for males and females will be identical (assuming that you use a zero-centered prior on the coefficient for gender). If you bake `genderFemale`

into the intercept, then you’ll have a wider prior pushforward for `genderMale`

than `genderFemale`

.

If you really want to do empirical Bayes, then do the above, but use your empirical Bayes priors.

I just want to double check (and I hope I’m not being presumptuous here):

You have recently asked a lot of questions about setting discrete priors (e.g. negative binomial, poisson, discrete factor variables in general). I just want to make sure that you aren’t confusing discrete *parameters* (which do not work in Stan and need to be marginalized out) with discrete *covariates* (which work just fine in Stan). In the `intercept + gender_male`

model, for example, `gender_male`

is a discrete *covariate*. In a regression context, `gender_male`

then gets multiplied by a coefficient that gets estimated from the data. This *parameter* (i.e. the coefficient) is not discrete; it is continuous. It can take any value. You would encode prior information about this coefficient using some appropriate continuous distribution like a Gaussian or a Student t.

4 Likes