I have a question related to how to know which prior to use on a given variable. I have no problem fitting models with `rstanarm`

and I understand the reasoning behind Bayesian analysis but I want to understand better how to choose a specific prior when I have a prior believe. For example, I’m working on an experiment where some schools received money and parental participation in school affairs as a treatment. I want to look at whether this had an effect on test scores. Let’s assume that test scores is a dummy for 0 when `1:7`

and 1 when `8:10`

, where test scores lie between 1 and 10. In the literature, there is mixed evidence on whether this has worked; in some situations it had a positive effect and in others it had little to no effect.

My specific question is which prior to use? I know this is subjective but what is the reasoning behind picking specific families. Sometimes I see binary outcomes (treatment betas) specified with priors such as student t’s or normal priors but that wouldn’t make sense to me because the variable is binary. Should the prior be something like a binomial or geometric distribution? Distributions based on counts rather than continuous variables. I know that this specific treatment is very very unlikely to be negative, very probable to be close to zero allowing for small effects but also leaving small room for a reasonable effect (so a long tail). Of course, all of this applies for a continuous variable but I’m not sure I understand whether I can use these distributions for binary variables.

For my specific treatment beta, I was thinking of a prior distribution on something like a log normal where the bulk of the distribution if slightly above zero but a long tail towards the positive spectrum.

Thanks for any explanations.

I understand lots of people do this, but it does not make sense. If you have access to the raw test scores, model them as they are. But if they have to be categorized, it is better to make three categories.

In a Bernoulli GLM with \eta = \alpha + \beta \times t, you can put a continuous prior on \beta (and \alpha) because \eta = \frac{\Pr \left(y = 1\right)}{\Pr \left(y = 0\right)} is the continuous log-odds of success, rather than whether success or failure occurs.

You should not use a discrete distribution for a prior (especially in Stan) because the parameters must be continuous in order for Hamiltonian MCMC to work.

1 Like

Thanks @bgoodri. I understand your concern about the dummy variable. I usually keep it as is to maximize variability and just leave the raw data. I did it here for simplificity. I have two follow up questions if that’s alright.

Does the metric of the prior distribution matter relative to the metric of the beta? So, does it matter if I think my beta is around 0.1 and 0.4 and I set a normal prior with mean 0.2 and sd 0.5 or if I set it to 1 and 0.5? That is, does Stan consider the shape alone or the shape as well as the metric?

And as a follow up, I understand your explanation on the Bernoulli GLM for a beta that only has two categories. We only get 1 beta so we provide a prior distribution of possible values for that beta. That beta will be constrained towards the bulk of whatever distribution you specify. But how do I specify a prior when I have three categories? So two betas and one as a reference. Do I specify priors separately for each category separately? I’m currently only using `rstanarm`

but I think I know the answer to this. In a Stan program I could specify priors specifically on betas rather than variables from a data frame so that would fix the problem, but I’m not sure whether this is possible in `rstanarm`

.

Thanks again for your answers, they’re really clarifying my doubts!

See https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html for some discussion on studying the consequences of your priors, which can help you build some intuition for what information you want to encode in them in a given application, and

https://betanalpha.github.io/assets/case_studies/weakly_informative_shapes.html for particular discussion on “weakly informative priors” which show up in models as those ubiquitous `normal(0, scale)`

distributions.

1 Like

@betanalpha this is exactly what I was looking for. Thanks a lot!