When reporting the estimates of rate and shape, both show similar pattern, which is not very informative. I would prefer some other parametrization which provides orthogonal estimates.

If I recall correctly, the stan prior recommendation list recommends independent priors. Sure, I can set up a gamma model with independent rate and shape priors, but the data then introduce the dependence pattern. Is that not an issue for the sampler? Or, does the sampler use some different implicit parametrization under the hood?

In any case, all of the popular parameter transformations (log rate/shape, scale, mean, mode, std) do not manage to remove the dependence.

What works rather well is parametrization in terms of log scale and log of the geometric mean, but it’s difficult to use this to construct priors because I need to compute inverse digamma function.

For reference here is the model code I am primarily using:

data {
int<lower=0> N;
vector<lower=0>[N] y;
}parameters {
real b;
real k;
}model {
y~gamma(exp(k),exp(b));
}

I’ve tried different priors and I’ve also run a fake data simulation with identical results.

and I don’t see a substitution that would make that diagonal.

I think it would be better to say there are a bunch of independent priors on the list. That does not mean that if what you believe about the rate depends on what you believe about the shape that you shouldn’t express your prior as f(shape) * f(rate | shape). But it is often difficult for people to express forms of dependence explicitly, so they usually just specify their priors marginally and accept the fact that the posterior distribution has a little less dependence than it ideally should. If you have a decent amount of data, the distortion is pretty negligible.

Just for the information. I computed the off-diagonal element of fisher matrix for Gamma distribution parametrized by log geometric mean and log scale and it is indeed equal to zero suggesting that log geometric mean and log scale provide orthogonal parametrization.

For those interested, here is my STAN implementation of inverse digamma function:

functions{
vector invdigamma(vector x){
vector[num_elements(x)] y; vector[num_elements(x)] L;
for (i in 1:num_elements(x)){
if (x[i]==digamma(1)){
y[i]=1;
}else{ if (x[i]>=-2.22){
y[i]=(exp(x[i])+0.5);
}else{
y[i]=1/(x[i]-digamma(1));
}}}
L=digamma(y)-x;
while (min(L)>10^-12){
y=y-L ./trigamma(y);
L=digamma(y)-x;
}
return y;}
real invdigammaR(real x){
real y; real L;
if (x==digamma(1)){
y=1;
}else{ if (x>=-2.22){
y=(exp(x)+0.5);
}else{
y=1/(x-digamma(1));
}}
L=digamma(y)-x;
while (L>10^-12){
y=y-L ./trigamma(y);
L=digamma(y)-x;
}
return y;
}
}