Spevfifying a uniform(0, 1) vs a normal(0.5, 0.5) prior

jphill01 · May 10, 2024, 7:07pm

In the Stan Prior Recommendations document here:

within the General Principles section, it is stated verbatim:

“You think a parameter could be anywhere from 0 to 1, so you set the prior to uniform(0,1). Try normal(.5,.5) instead.”

In my scenario, I’m trying to estimate tail probabilities for highly skewed distributions where probabilities can be very close to 0 or very close to 1.

My question is: What are the advantages of using a normal(0.5, 0.5) prior here?

Corey.Plate · May 10, 2024, 8:23pm

It sounds like you might want to use a beta distribution for a prior here instead.

Neither a uniform(0,1) nor normal(0.5,0.5) prior represents a prior belief that the parameter(s) have mode(s) at 0 and 1. For the uniform(0,1) prior, there is no mode, and for the normal(0.5,0.5) prior, the mode is at 0.5.

You can try to find the correct beta distirbution hierarchically by having: beta(a,b), where a and b are parameters drawn from uniform(0,1) hyperpriors. A beta distribution is bimodal when its a and b parameters are between zero and one, non-inclusive. It is symmetrically bimodal when a = b. Alternatively, you can directly model the beta distribution as symmetrically bimodal by guessing beta(0.5,0.5) unless you have some reason to believe a and b are some other value besides 0.5.

Here is the implementation:

parameters {

 real<lower=0,upper=1> a;
 real<lower=0,upper=1> b;

 vector<lower=0,upper=1>[N] bet;

}

model {

 a ~ uniform(0,1);
 b ~ uniform(0,1);

 bet ~ beta(a,b)

}

Or, if you want to enforce symmetry:

parameters {

 real<lower=0,upper=1> a;

 vector<lower=0,upper=1>[N] bet;

}

model {

 a ~ uniform(0,1);

 bet ~ beta(a,a)

}

Or to implement it non-hierarchically:

parameters {

 vector<lower=0,upper=1>[N] bet;

}

model {

 bet ~ beta(0.5,0.5)

}

These assume you have multiple parameters with the same distribution. If not, you would just use real<lower=1,upper=1> bet instead of vector<lower=1,upper=1>[N] bet but the rest remains the same.

jphill01 · May 11, 2024, 12:16am

@Corey.Plate

Your suggestion sounds like a good approach on further consideration, but I’ll also note (for others who may read and learn from this) that uniform(0, 1) is equivalent to beta(1, 1).

The only thing I need to ponder is hierarchical or not, and symmetric or not…

mhollanders · May 12, 2024, 3:53am

Place a logistic or normal prior on the logit transformed probabilities can give you these bathtub formations. For instance, try rlogis(1e4, 0, 2) |> plogis() |> hist() in R.

tarheel · May 12, 2024, 12:31pm

First, it’s important to point out that the documentation is referring to unbounded parameters that you think are likely to be in some closed interval [0,1]. This is a lot different than a parameter that is actually bounded over that interval (e.g., Bernoulli probabilities). If the parameters could take any number on the real line (e.g., for a N(\mu, 1) model, you should not put a prior \mu \sim U(0, 1) even if you believe \mu is all but certain to obey these bounds. An alternative in this latter case is to use the Generalized normal distribution, which imposes a soft constraint rather than a hard one. Strictly speaking, your prior should depend on your a priori beleifs about the parameters, i.e., your beliefs about the parameters before looking at the data (hence the term prior).

Second, it is not clear to me which of these situations you are dealing with. Based on the other responses, I’m going to assume that your model is Y \sim \text{Bernoulli}(\gamma).

Suppose you believe, a priori, that \Pr(a \le \gamma \le b) \ge \phi for some \phi (e.g., \phi = 0.95) (i.e., you are all but certain that \gamma \in [a,b]). Some suitable priors would be

\gamma \sim Truncated Normal(m, s, 0, 1), where m = (a + b) / 2 and s is chosen to satisfy the probability inequality.
Beta(sm, s(1 - m) ), where m = (a + b) / 2 and s is chosen to satisfy the probability inequality.

In both cases, m is the prior mean and s controls the spread of the prior.

If you believe a priori \Pr( \{ \gamma < a \} \cup \{ \gamma > b \} ) \approx 1 for some 0 < a < 0.5 < b < 1 (which is unlikely), you can do

Beta(a, b) prior with a,b < 1 (e.g., a = b = 1/2)
Mixture of truncated normal priors
Truncated Normal(m, s).

The first prior is a U-shaped prior similar to what @mhollanders suggested. These priors have asymptotes (infinite modes) at 0 and 1.

The second prior allows you to have finite prior modes (often helpful for MCMC sampling, especially with sparse data) and can approximate any prior belief that you wish to posit.

The third prior performs shrinkage (i.e., moves your parameters towards m). The lower the value of s, the higher the degree of shrinkage. Oftentimes this is desirable if you believe your data are so sparse that they are incapable of actually representing the truth.

jphill01 · May 12, 2024, 6:58pm

@tarheel Actually, my model is Y \sim \mathrm{Binomial}(n, \theta), but your recommendations should still hold since my model is just a sum of Bernoullis.

Topic		Replies	Views
Priors for Standard deviation parameters Modeling	7	1211	October 19, 2021
Stan's default uniform priors Modeling	7	4127	May 13, 2018
Initialization using prior Developers	4	1235	January 15, 2020
Prior recommendation for scale parameters in hierarchical models too strong? Modeling	25	8264	January 31, 2018
Normal priors on scale parameter General specification	6	1457	February 25, 2022

Spevfifying a uniform(0, 1) vs a normal(0.5, 0.5) prior

Related topics