# Posterior estimates of rate and shape of gamma distribution are dependent

I have two concerns:

1. When reporting the estimates of rate and shape, both show similar pattern, which is not very informative. I would prefer some other parametrization which provides orthogonal estimates.

2. If I recall correctly, the stan prior recommendation list recommends independent priors. Sure, I can set up a gamma model with independent rate and shape priors, but the data then introduce the dependence pattern. Is that not an issue for the sampler? Or, does the sampler use some different implicit parametrization under the hood?

In any case, all of the popular parameter transformations (log rate/shape, scale, mean, mode, std) do not manage to remove the dependence.

What works rather well is parametrization in terms of log scale and log of the geometric mean, but it’s difficult to use this to construct priors because I need to compute inverse digamma function.

For reference here is the model code I am primarily using:

data {
int<lower=0> N;
vector<lower=0>[N] y;
}parameters {
real b;
real k;
}model {
y~gamma(exp(k),exp(b));
}


I’ve tried different priors and I’ve also run a fake data simulation with identical results.

1 Like

I don’t think that is possible with the Gamma distribution. The Fisher information matrix seems to be

   |  trigamma(alpha) | -1/beta        |
N  | ---------------- | -------------- |
| -1/beta          | alpha / beta^2 |


and I don’t see a substitution that would make that diagonal.

I think it would be better to say there are a bunch of independent priors on the list. That does not mean that if what you believe about the rate depends on what you believe about the shape that you shouldn’t express your prior as f(shape) * f(rate | shape). But it is often difficult for people to express forms of dependence explicitly, so they usually just specify their priors marginally and accept the fact that the posterior distribution has a little less dependence than it ideally should. If you have a decent amount of data, the distortion is pretty negligible.

1 Like

Just for the information. I computed the off-diagonal element of fisher matrix for Gamma distribution parametrized by log geometric mean and log scale and it is indeed equal to zero suggesting that log geometric mean and log scale provide orthogonal parametrization.

For those interested, here is my STAN implementation of inverse digamma function:

functions{
vector invdigamma(vector x){
vector[num_elements(x)] y; vector[num_elements(x)] L;
for (i in 1:num_elements(x)){
if (x[i]==digamma(1)){
y[i]=1;
}else{ if (x[i]>=-2.22){
y[i]=(exp(x[i])+0.5);
}else{
y[i]=1/(x[i]-digamma(1));
}}}
L=digamma(y)-x;
while (min(L)>10^-12){
y=y-L ./trigamma(y);
L=digamma(y)-x;
}
return y;}
real invdigammaR(real x){
real y; real L;
if (x==digamma(1)){
y=1;
}else{ if (x>=-2.22){
y=(exp(x)+0.5);
}else{
y=1/(x-digamma(1));
}}
L=digamma(y)-x;
while (L>10^-12){
y=y-L ./trigamma(y);
L=digamma(y)-x;
}
return y;
}
}
5 Likes

Good to know

1 Like

Hey, I was wondering if you kept investigating this topic.

I’ve got a manuscript ready that investigates the orthogonal parametrization of gamma, Weibull and inverse Gaussian distribution and of some of their extensions. It got rejected by “Communications in Statistics – Theory and Methods” because it lacked application (i.e. analysis of empirical data). All other suitable journals, that I’m aware of, require applications as well. I will have to add it to the manuscript. This takes time.

So far I made parameter recovery experiments with hierarchical gamma distribution with the orthogonal parameterzation. It worked well. You can find the (currently undocumented) code here: GitHub - simkovic/OrthoPar

I’m currently using the following code to compute the inverse gamma function.

real invdigamma(real x){
real y; real L;
if (x>=-5.24) y=1/log(1+exp(-x));
else y=1/(digamma(1)-x);
L=digamma(y)-x;
while (fabs(L)>1e-12){
y=y-L ./trigamma(y);
L=digamma(y)-x;}
return y;}


The code I posted previously had mistakes in the formula for computation of starting values. The latest version uses the formulas from Batir (2018) to compute the starting values. These are pretty accurate.

Batir, N. (2018). Inequalities for the inverses of the polygamma functions. Archiv der Mathematik, 110(6):581–589.

4 Likes

I don’t see how the off diagonal goes to 0 with this parameterization. Do you have the derivation?

Thanks for coming back so quickly. I’ll have a look at the resources. This might be very useful in the context of degradation modelling and reliability.
yet I need to understand better what you have done.
Thanks and keep up.

Define Gamma distribution as:
f(Y=y;\rho,\theta)=\frac{1}{\Gamma(\rho) \theta^\rho} y^{\rho - 1} e^{-\frac{y}{\theta}}

and the new parametrization as \gamma=E[\log Y]=\psi(\rho)+\log \theta and \tau=\log(\theta)

Then the diagonal element is

\begin{align} \operatorname{E}_Y \left[ \frac{\partial \log f(Y;\tau,\gamma)}{\partial\tau\ \partial\gamma} \right]&= \operatorname{E}\left[\omega''(\gamma-\tau)(\gamma-\log y)\right] \\ &=\omega''(\gamma-\tau)(\gamma-\operatorname{E}[\log y])\\ &=\omega''(\gamma-\tau)(\gamma-\gamma)\\ &=0 \end{align}

where \omega(\cdot) is the inverse digamma function which is monotonic on \mathcal{R} \rightarrow \mathcal{R}^+.

An intermediate result is \begin{align} \frac{\partial \log f(Y;\gamma,\tau)}{\partial\gamma} = (\log y - \gamma)\omega'(\gamma -\tau)\end{align}.

We use a different derivation strategy in the manuscript and I copied these formulas from my private notes. Let me know if there are any formatting errors.

4 Likes

that is cool thanks for sharing! I’m going to put an issue to add this to the user manual. You could make a Stan case study with your manuscript to show the application.

1 Like