The parameterization of the scaled inverse chi-squared distribution seems off to me. The sigma parameter is squared in the pdf:
\text{ScaledInvChiSquare}(y|\nu,\sigma) = \frac{(\nu / 2)^{\nu/2}} {\Gamma(\nu / 2)} \, \sigma^{\nu} \, y^{-(\nu/2 + 1)} \, \exp \! \left( \! - \, \frac{1}{2} \, \nu \, \sigma^2 \, \frac{1}{y} \right) .
Needless to say, this doesn’t match the textbook version. I know Stan does many things differently. Perhaps there’s a justification for this?
If the goal here is to be aligned with how the normal distribution uses the scale instead of the variance, there’s still a problem. You still need to model the variance, sigma2 ~ scaled_inv_chi_squared(df, s)
. So if your parameter is the variance one might as well use the variance in the pdf.
1 Like
Note the \sigma on the left-hand side and the \sigma^2 on the right-hand side.
For example, you have to write the following to get things to work using the Wikipedia definition: X ~ scaled_inv_chi_squared(df, sqrt(tau2))
.
No? It’s the same thing.
Wiki page:
\frac{(\sigma^2 \frac{\nu}{2})^{\frac{\nu}{2}}}{\Gamma(\frac{\nu}{2})} \frac{\exp (\frac{-1}{2}\frac{\nu\sigma^2}{x})}{x^{1 + \nu / 2}}
If you just move the sigma^2 out of the parentheses:
\frac{( \frac{\nu}{2})^{\frac{\nu}{2}}\sigma^\nu}{\Gamma(\frac{\nu}{2})} \frac{\exp (\frac{-1}{2}\frac{\nu\sigma^2}{x})}{x^{1 + \nu / 2}}
It’s the same thing. One effectively has (\sigma^2)^{1/2}, the other has \sigma. Different ways of writing it, but it’s not a different parameterization, any more than using 1/2 is a different parameterization than 2/4.
I see what you are saying now; you’re just saying instead of feeding it sigma^2 as an argument, you have to feed it sigma. (They are the same function, it just matters whether the parameter is pre-squared or not).
I’m not sure how much an issue this really is. Is there a particular reason you prefer estimating sigma^2 directly, rather than sigma? Or transforming one to another? I find the stan arguments to be clearer, personally. Otherwise, you have to make it really clear that the input is indeed sigma^2, then change the formula to assume the argument is indeed pre-squared. That gets confusing for people.
The scaled inverse chi-squared distribution is typically used as a prior distribution for the variance in a normal distribution. The second parameter is typically the “prior variance”. See Section 2.6 in BDA3 or, say, Wikipedia. Using the variance parameterization is nearly universal. I’m curious why Stan is an outlier. In fact, I’m wondering if this isn’t simply a bug.
1 Like
They should still return the same value though.
One takes sigma^2 as an argument; it replaces all instances of sigma^2 with your sigma^2.
The other takes sigma as an argument; it replaces all instances of sigma with your sigma.
The function is the same, and it will return the same value. It’s just that you should sample sigma instead of sigma^2. It’s not a bug, they just set the arguments in terms of sigma, rather than sigma^2.
Like, if you had f(y) = y^2 + y, and another g(y^2) = y^2 + y, you’d get the same output. All that matters is in the programming of it: f(a) = a^2 + a; g(b) = b + sqrt(b). You’d still get the same answer, no? It just matters whether you think in terms of ‘y’ or ‘y^2’. The output winds up being the same. So if you sampled just sigma, and used the scaled inv. chi-squared using the arguments that stan uses, you’d get the same answer that you would if you sampled sigma^2, and used a version that assumes the input is a sigma^2.
With that said - you don’t have to worry about conjugacy with Stan anyway; I wasn’t aware that people still use scaled inverse chi-squared over, say, a half-t or half-normal or anything else that is easier to reason about.
I haven’t found the replies to this post particularly helpful. To those who find themselves in a similar situation, I suggest using the inverse gamma distribution.
Digging this up for future people to ask/tell whether/that this is the solution if I have an unknown scale population_sigma
as a parameter and want to give the variance pow(population_sigma,2)
an inverse-chi-squared prior centered at some estimate population_sigma_estimate
with degrees of freedom population_sigma_nu
:
pow(population_sigma, 2) ~ scaled_inv_chi_square(population_sigma_nu, population_sigma_estimate);
// Jacobian adjustment
target += log(2*population_sigma);
The code looks weird, but it appears like this does the right thing. Is this correct? Should this (or the correct code) be added to the documentation?