When we say that, in Stan, parameters are given uniform priors when otherwise unspecified, is this equivalent to saying that each of the discrete possible floating point numbers is given equal probability? This could be weird, as apparently half of single-precision[0] floating point numbers are in the [-1, 1] interval.
[0] I realize we use double-precision but I assume there are similarly weird things happening there.
It is not equivalent. The priors work independently of the particular floating point implementation (otherwise all of Stan’s results and those of similar software would probably be biased). With “all values” we mean, mathematically, all real values. Because a prior with equal weight for all real values cannot yield a distribution which integrates to 1, this is an improper prior.
Do you know how that’s implemented? I am new in this area and having trouble imagining how that would work; from a naive outside perspective it seems like the default is “uniform” because it doesn’t modify the target lpdf at all for any given parameter value without a distributional statement attached. But this would be equivalent to saying all floating point numbers have equivalent probability, which runs into some of the issues in the link above (though presumably it’s not quite as bad as that, as we likely don’t use more than 1 FP representation of any given number, avoiding the issue they talk about where there are e.g. 257 ways to represent the number 0.5).
Just keep in mind that random variables in HMC are sampled by integrating a bunch of ODEs around in parameter space, not by actually assigning probabilities to individual outcomes and sampling them. So uniform is not the same as saying all floating point values have equal probability. Same sorta thing for Metropolis but the jumps are different.
I don’t think there is any (working) implementation of probabilitic algorithms assigning probabilities to individual floating points numbers. If this was true, all continuous distributions would be biased towards areas with high density of floating points numbers.
This is a great point - I think all parameter values are given equal probability, but because these candidate parameter values are generated by a process that isn’t affected by the density issues of floating point numbers, everything comes out as if it were continuous uniform.
@seantalts yeah I think that’s right. It definitely does not imply a discrete uniform distribution over the floating point numbers representable on a computer. But I can totally see why it would seem to imply that. And I’m surprised that question hasn’t come up more. I guess it’s probably because you need a background in computers for it to even occur to you to that it’s something worth wondering about!
Thinking about what continuous density that would approximate is a good way to start thinking about Jacobians :-)
No. They’re all given equal density. You need to integrate density to get back to probability. What happens is that every interval of the same width has the same probability in a simple upper or lower-bounded or unconstrained distribution. But if you do that with a non-compact set of \mathbb{R}^N, things blow up, so probability and density aren’t properly defined. But we just act as if they were and continue (hoping the posterior will be proper after seeing some data).
So if you do something like
real<lower = 0, upper = 1> sigma;
that’s fine, because \int_0^1 c \, \mathrm{d}\sigma = c. But if we do this,
real<lower = 0> sigma;
we run into problems because \int_0^{\infty} c \, \mathrm{d}\sigma= \infty. So that’s why it’s “improper”.
Correct, not affected by density of floating point numbers other than in terms of precision. There’s much higher precision in a small neighborhood around 0 than the same sized neighborhood around 1, for example.