Parameterization of scaled inverse chi-squared

ariddell · May 20, 2020, 7:01pm

The parameterization of the scaled inverse chi-squared distribution seems off to me. The sigma parameter is squared in the pdf:

\text{ScaledInvChiSquare}(y|\nu,\sigma) = \frac{(\nu / 2)^{\nu/2}} {\Gamma(\nu / 2)} \, \sigma^{\nu} \, y^{-(\nu/2 + 1)} \, \exp \! \left( \! - \, \frac{1}{2} \, \nu \, \sigma^2 \, \frac{1}{y} \right) .

Needless to say, this doesn’t match the textbook version. I know Stan does many things differently. Perhaps there’s a justification for this?

If the goal here is to be aligned with how the normal distribution uses the scale instead of the variance, there’s still a problem. You still need to model the variance, sigma2 ~ scaled_inv_chi_squared(df, s). So if your parameter is the variance one might as well use the variance in the pdf.

Stephen_Martin · May 20, 2020, 8:43pm

The definition matches https://en.wikipedia.org/wiki/Scaled_inverse_chi-squared_distribution , no?

ariddell · May 20, 2020, 11:42pm

Note the \sigma on the left-hand side and the \sigma^2 on the right-hand side.

For example, you have to write the following to get things to work using the Wikipedia definition: X ~ scaled_inv_chi_squared(df, sqrt(tau2)).

Stephen_Martin · May 21, 2020, 12:24am

No? It’s the same thing.

Wiki page:

\frac{(\sigma^2 \frac{\nu}{2})^{\frac{\nu}{2}}}{\Gamma(\frac{\nu}{2})} \frac{\exp (\frac{-1}{2}\frac{\nu\sigma^2}{x})}{x^{1 + \nu / 2}}

If you just move the sigma^2 out of the parentheses:

\frac{( \frac{\nu}{2})^{\frac{\nu}{2}}\sigma^\nu}{\Gamma(\frac{\nu}{2})} \frac{\exp (\frac{-1}{2}\frac{\nu\sigma^2}{x})}{x^{1 + \nu / 2}}

It’s the same thing. One effectively has (\sigma^2)^{1/2}, the other has \sigma. Different ways of writing it, but it’s not a different parameterization, any more than using 1/2 is a different parameterization than 2/4.

Stephen_Martin · May 21, 2020, 12:44am

I see what you are saying now; you’re just saying instead of feeding it sigma^2 as an argument, you have to feed it sigma. (They are the same function, it just matters whether the parameter is pre-squared or not).

I’m not sure how much an issue this really is. Is there a particular reason you prefer estimating sigma^2 directly, rather than sigma? Or transforming one to another? I find the stan arguments to be clearer, personally. Otherwise, you have to make it really clear that the input is indeed sigma^2, then change the formula to assume the argument is indeed pre-squared. That gets confusing for people.

ariddell · May 21, 2020, 1:38am

The scaled inverse chi-squared distribution is typically used as a prior distribution for the variance in a normal distribution. The second parameter is typically the “prior variance”. See Section 2.6 in BDA3 or, say, Wikipedia. Using the variance parameterization is nearly universal. I’m curious why Stan is an outlier. In fact, I’m wondering if this isn’t simply a bug.

Stephen_Martin · May 21, 2020, 2:50am

They should still return the same value though.

One takes sigma^2 as an argument; it replaces all instances of sigma^2 with your sigma^2.

The other takes sigma as an argument; it replaces all instances of sigma with your sigma.

The function is the same, and it will return the same value. It’s just that you should sample sigma instead of sigma^2. It’s not a bug, they just set the arguments in terms of sigma, rather than sigma^2.

Like, if you had f(y) = y^2 + y, and another g(y^2) = y^2 + y, you’d get the same output. All that matters is in the programming of it: f(a) = a^2 + a; g(b) = b + sqrt(b). You’d still get the same answer, no? It just matters whether you think in terms of ‘y’ or ‘y^2’. The output winds up being the same. So if you sampled just sigma, and used the scaled inv. chi-squared using the arguments that stan uses, you’d get the same answer that you would if you sampled sigma^2, and used a version that assumes the input is a sigma^2.

With that said - you don’t have to worry about conjugacy with Stan anyway; I wasn’t aware that people still use scaled inverse chi-squared over, say, a half-t or half-normal or anything else that is easier to reason about.

ariddell · May 29, 2020, 12:35pm

I haven’t found the replies to this post particularly helpful. To those who find themselves in a similar situation, I suggest using the inverse gamma distribution.

Funko_Unko · May 26, 2021, 10:41am

Digging this up for future people to ask/tell whether/that this is the solution if I have an unknown scale population_sigma as a parameter and want to give the variance pow(population_sigma,2) an inverse-chi-squared prior centered at some estimate population_sigma_estimate with degrees of freedom population_sigma_nu:

pow(population_sigma, 2) ~ scaled_inv_chi_square(population_sigma_nu, population_sigma_estimate);
// Jacobian adjustment
target += log(2*population_sigma);

The code looks weird, but it appears like this does the right thing. Is this correct? Should this (or the correct code) be added to the documentation?

Topic		Replies	Views
Using inverse gamma as a prior of sigma in the regression model Modeling specification	2	467	June 13, 2021
Gamma distribution - parameterization -- gamma distribution General	1	2801	December 21, 2017
Where does the scale of the priors recommended in Stan Users' Guide for Gaussian Processes come from? General	2	657	August 27, 2018
Inverse Wishart Distribution in Stan General	5	3882	April 11, 2022
Student T distribution: A question on scale General	3	3596	July 29, 2020

Parameterization of scaled inverse chi-squared

Related topics