Spevfifying a uniform(0, 1) vs a normal(0.5, 0.5) prior

In the Stan Prior Recommendations document here:

within the General Principles section, it is stated verbatim:

“You think a parameter could be anywhere from 0 to 1, so you set the prior to uniform(0,1). Try normal(.5,.5) instead.”

In my scenario, I’m trying to estimate tail probabilities for highly skewed distributions where probabilities can be very close to 0 or very close to 1.

My question is: What are the advantages of using a normal(0.5, 0.5) prior here?

It sounds like you might want to use a beta distribution for a prior here instead.

Neither a uniform(0,1) nor normal(0.5,0.5) prior represents a prior belief that the parameter(s) have mode(s) at 0 and 1. For the uniform(0,1) prior, there is no mode, and for the normal(0.5,0.5) prior, the mode is at 0.5.

You can try to find the correct beta distirbution hierarchically by having: beta(a,b), where a and b are parameters drawn from uniform(0,1) hyperpriors. A beta distribution is bimodal when its a and b parameters are between zero and one, non-inclusive. It is symmetrically bimodal when a = b. Alternatively, you can directly model the beta distribution as symmetrically bimodal by guessing beta(0.5,0.5) unless you have some reason to believe a and b are some other value besides 0.5.

Here is the implementation:

parameters {

 real<lower=0,upper=1> a;
 real<lower=0,upper=1> b;

 vector<lower=0,upper=1>[N] bet;

}

model {

 a ~ uniform(0,1);
 b ~ uniform(0,1);

 bet ~ beta(a,b)

}

Or, if you want to enforce symmetry:

parameters {

 real<lower=0,upper=1> a;

 vector<lower=0,upper=1>[N] bet;

}

model {

 a ~ uniform(0,1);

 bet ~ beta(a,a)

}

Or to implement it non-hierarchically:

parameters {

 vector<lower=0,upper=1>[N] bet;

}

model {

 bet ~ beta(0.5,0.5)

}

These assume you have multiple parameters with the same distribution. If not, you would just use real<lower=1,upper=1> bet instead of vector<lower=1,upper=1>[N] bet but the rest remains the same.

@Corey.Plate

Your suggestion sounds like a good approach on further consideration, but I’ll also note (for others who may read and learn from this) that uniform(0, 1) is equivalent to beta(1, 1).

The only thing I need to ponder is hierarchical or not, and symmetric or not…

Place a logistic or normal prior on the logit transformed probabilities can give you these bathtub formations. For instance, try rlogis(1e4, 0, 2) |> plogis() |> hist() in R.

First, it’s important to point out that the documentation is referring to unbounded parameters that you think are likely to be in some closed interval [0,1]. This is a lot different than a parameter that is actually bounded over that interval (e.g., Bernoulli probabilities). If the parameters could take any number on the real line (e.g., for a N(\mu, 1) model, you should not put a prior \mu \sim U(0, 1) even if you believe \mu is all but certain to obey these bounds. An alternative in this latter case is to use the Generalized normal distribution, which imposes a soft constraint rather than a hard one. Strictly speaking, your prior should depend on your a priori beleifs about the parameters, i.e., your beliefs about the parameters before looking at the data (hence the term prior).

Second, it is not clear to me which of these situations you are dealing with. Based on the other responses, I’m going to assume that your model is Y \sim \text{Bernoulli}(\gamma).

Suppose you believe, a priori, that \Pr(a \le \gamma \le b) \ge \phi for some \phi (e.g., \phi = 0.95) (i.e., you are all but certain that \gamma \in [a,b]). Some suitable priors would be

  1. \gamma \sim Truncated Normal(m, s, 0, 1), where m = (a + b) / 2 and s is chosen to satisfy the probability inequality.
  2. Beta(sm, s(1 - m) ), where m = (a + b) / 2 and s is chosen to satisfy the probability inequality.

In both cases, m is the prior mean and s controls the spread of the prior.

If you believe a priori \Pr( \{ \gamma < a \} \cup \{ \gamma > b \} ) \approx 1 for some 0 < a < 0.5 < b < 1 (which is unlikely), you can do

  1. Beta(a, b) prior with a,b < 1 (e.g., a = b = 1/2)
  2. Mixture of truncated normal priors
  3. Truncated Normal(m, s).

The first prior is a U-shaped prior similar to what @mhollanders suggested. These priors have asymptotes (infinite modes) at 0 and 1.

The second prior allows you to have finite prior modes (often helpful for MCMC sampling, especially with sparse data) and can approximate any prior belief that you wish to posit.

The third prior performs shrinkage (i.e., moves your parameters towards m). The lower the value of s, the higher the degree of shrinkage. Oftentimes this is desirable if you believe your data are so sparse that they are incapable of actually representing the truth.

@tarheel Actually, my model is Y \sim \mathrm{Binomial}(n, \theta), but your recommendations should still hold since my model is just a sum of Bernoullis.