Specifying (non-)hierarchical prior parameters with "half" distributions

Question

What is the correct way to specify “half” distributions (e.g. half-cauchy) as a prior, including when a parameters are assigned hierarchical priors?

Context

I ask because it’s not clear from these posts whether the explicit truncation is required (or only omitted because of a use case) yet I often see others only using the parameter constraints but then using e.g. cross-validation without the truncation. I’m not sure if this is because they (don’t) know better or it’s user error.

My current understanding is that one should (1) constrain the parameters block and (2) explicitly truncate the distribution in the model block, as shown below:

parameters{
real<lower=0> sigma; // constrained
}

model{
y ~ normal(0, sigma)
sigma ~ cauchy(0, 1)[0, ] // truncated
}

However, I’m not sure when it comes to hierarchical priors.

For example, is it correct to say the prior below on \sigma is HalfCauchy(0,\phi) where \phi \sim HalfCauchy(0, 1)? Or should I remove some of the truncation/constraints? Not clear if any transforms are required.

parameters{
  real<lower=0> sigma; // constrained
  real<lower=0> phi; // constrained
}

model{
y ~ normal(4, sigma)
sigma ~ cauchy(0, phi)[0, ] // truncated
phi ~ cauchy(0, 1)[0, ] // truncated
}

To my understanding, you may need the [0,] for sigma, but you could remove the [0, ] for phi.

It’s not wrong to have it for phi, but it’s unnecessary in this case because it’s a constant value.

I’m not even absolutely sure whether you need it for sigma, because the cauchy is centered at zero, and you’d be dividing by .5 no matter phi is, right?

The explicit truncation is needed for when 1) truncation points vary or 2) The adjustment done by the truncation is not constant.

All the lower truncation does, conceptually, is renormalize:

p(\phi) = \frac{C(\phi | 0, 1)}{1 - F_{\text{Cauchy}}(0 | 0, 1)}; \phi \in \mathcal{R^+}

so that p(\phi) would integrate to 1.

In stan-speak, that’d be:

target += cauchy_lpdf(phi | 0, 1) - cauchy_lccdf(0 | 0, 1)

That latter term is a constant, and can be dropped, because the log-posterior need only be defined up to a normalizing constant.

Someone can correct me if I’m wrong.

Edit

To clarify:

when you say real<lower=0> phi, it’s telling Stan that phi should be a positive real value. Under the hood then, Stan tracks a ‘phi_unconstrained’ value, which is transformed such that it is > 0; Stan also handles the jacobian of this transformation.

When you say [a,b] in the model block, it’s telling Stan that a distribution is truncated at >= 0. Under the hood, Stan will do the renormalization, effectively, by dividing the probability function by CDF(b) - CDF(a), : p(x)/(F(b) - F(a)) (but on a log scale since a stan model is defining a log posterior).
This is not a necessary contribution if the renormalization is constant.

4 Likes

@Stephen_Martin’s answer is correct. For Stan demonstrations for when the normalization can be ignored and when it cannot see also An Introduction to Stan.

1 Like