Stan's default uniform priors

seantalts · April 25, 2018, 4:38pm

When we say that, in Stan, parameters are given uniform priors when otherwise unspecified, is this equivalent to saying that each of the discrete possible floating point numbers is given equal probability? This could be weird, as apparently half of single-precision[0] floating point numbers are in the [-1, 1] interval.

[0] I realize we use double-precision but I assume there are similarly weird things happening there.

paul.buerkner · April 25, 2018, 4:59pm

It is not equivalent. The priors work independently of the particular floating point implementation (otherwise all of Stan’s results and those of similar software would probably be biased). With “all values” we mean, mathematically, all real values. Because a prior with equal weight for all real values cannot yield a distribution which integrates to 1, this is an improper prior.

seantalts · April 25, 2018, 5:28pm

Do you know how that’s implemented? I am new in this area and having trouble imagining how that would work; from a naive outside perspective it seems like the default is “uniform” because it doesn’t modify the target lpdf at all for any given parameter value without a distributional statement attached. But this would be equivalent to saying all floating point numbers have equivalent probability, which runs into some of the issues in the link above (though presumably it’s not quite as bad as that, as we likely don’t use more than 1 FP representation of any given number, avoiding the issue they talk about where there are e.g. 257 ways to represent the number 0.5).

bbbales2 · April 25, 2018, 5:47pm

Just keep in mind that random variables in HMC are sampled by integrating a bunch of ODEs around in parameter space, not by actually assigning probabilities to individual outcomes and sampling them. So uniform is not the same as saying all floating point values have equal probability. Same sorta thing for Metropolis but the jumps are different.

paul.buerkner · April 25, 2018, 5:49pm

I don’t think there is any (working) implementation of probabilitic algorithms assigning probabilities to individual floating points numbers. If this was true, all continuous distributions would be biased towards areas with high density of floating points numbers.

seantalts · April 26, 2018, 12:44am

This is a great point - I think all parameter values are given equal probability, but because these candidate parameter values are generated by a process that isn’t affected by the density issues of floating point numbers, everything comes out as if it were continuous uniform.

jonah · April 26, 2018, 12:56am

@seantalts yeah I think that’s right. It definitely does not imply a discrete uniform distribution over the floating point numbers representable on a computer. But I can totally see why it would seem to imply that. And I’m surprised that question hasn’t come up more. I guess it’s probably because you need a background in computers for it to even occur to you to that it’s something worth wondering about!

Bob_Carpenter · May 13, 2018, 11:43pm

Thinking about what continuous density that would approximate is a good way to start thinking about Jacobians :-)

No. They’re all given equal density. You need to integrate density to get back to probability. What happens is that every interval of the same width has the same probability in a simple upper or lower-bounded or unconstrained distribution. But if you do that with a non-compact set of \mathbb{R}^N, things blow up, so probability and density aren’t properly defined. But we just act as if they were and continue (hoping the posterior will be proper after seeing some data).

So if you do something like

real<lower = 0, upper = 1> sigma;

that’s fine, because \int_0^1 c \, \mathrm{d}\sigma = c. But if we do this,

real<lower = 0> sigma;

we run into problems because \int_0^{\infty} c \, \mathrm{d}\sigma= \infty. So that’s why it’s “improper”.

Correct, not affected by density of floating point numbers other than in terms of precision. There’s much higher precision in a small neighborhood around 0 than the same sized neighborhood around 1, for example.

Topic		Replies	Views
Comparing the posterior with the prior distribution RStan rstan , techniques	1	878	July 22, 2020
Proposal: including a "canary" variable to illustrate poor exploration of the posterior General techniques	11	699	June 15, 2020
How to properly compare priors Modeling specification , cognitive-science	3	872	November 12, 2018
[help]Mixture priors problem Modeling priors	6	622	May 18, 2021
Uncentered exponential priors and hyperpriors Modeling prior-choice , hierarchical-model	7	1534	March 24, 2023

Stan's default uniform priors

Related Topics