Prior recommendation for scale parameters in hierarchical models too strong?

Andrew keeps telling me the same thing as Daniel every time I try to drop in a lognormal prior for a scale.

It is a problem with Stan when we log transform positive values, because it puts the boundary of the parameter space at -infinity on the unconstrained scale, which is prone to cause numerical difficulties with basic quantities and especially with derivatives.

Rescaling is probably the solution here, not boundary avoiding priors—the latter is going to affect your solution.

Yeah - this is a problem with bounded parameter spaces. In some sense, this is why bouncy-type samplers may be better than a transformation that prolongs the space near the boundary, but we can do what we can do :p. I’ve not often seen a numerical problem here with the log-transofrm. Or maybe more accurately, any numerical problem is squashed when you back-transform

The problem is you can’t find a step size that works well out in the tails. So the samplers become super slow. And if you get really close to zero, there’s a tendency to run over because we only use gradients to approximate curvature.

The Jacobian of the log transform actually prevents the tails from stretching too far towards infinity in most cases. For example, you should be able to get high accuracy without too much cost if you just try to recover the quantiles of a half-normal in Stan.

The problem that you want to look out for is probability mass concentrating towards the boundary, not density. For any density that reaches a finite value at the boundary, the mass near the boundary will shrink as we consider smaller and smaller neighborhoods around it. It’s densities that spike at the boundary that can cause too much mass near the boundary and potentially heavy tails in the unconstrained space.

At the same time remember that the dynamic HMC within Stan can tackle heavy tails pretty well (see, for example, https://betanalpha.github.io/assets/case_studies/fitting_the_cauchy.html). The cost is higher but not so much that it would be completely infeasible in complex models.

Way back in the day I did a lot of work with HMC modified to bounce off of boundaries to satisfy constraints, but ultimately the performance wasn’t great. The problem is that when there’s appreciable mass near the boundary (or density spiking towards the boundary) the sampler needs extremely small step sizes to explore the neighborhood around the boundary before bouncing. The same pathological behavior just manifests in different ways.

Yeah - it’s a shame that this is an important area to be able to have mass in - the regime where an effect that you think should be there is not clearly present in the data.

Statistics is nothing if not inconvenient :p

2 Likes

Definitely worth keeping in mind. I tried to illustrate in the change of variables case study by looking at uniform[0, 1] just transformed to log odds as we do with our transform. You can see the tails thinning.

Interesting. That’s exactly what I’ve been speculating would happen. But I can’t quite visualize it.