Gamma regression in Stan vs Frequentist approach

Hi all,

I am trying to run a GLM model in Stan for Gamma distributed data but the model is not converging.
The data are simulated data (so I know that they follow a Gamma distribution) and in fact by fitting the model as a GLM I can get the right estimates for the model parameters.

I’ve already looked at this post but couldn’t find any solution.

Thanks!

Data:

y.csv (3.2 KB)
X.csv (13.2 KB)

Python code:

# Put our data in a dictionary
data = {'N': X.shape[0], 'K': X.shape[1], 'X': X,  'y': y}

# Compile the model
stanm = pystan.StanModel(file='./stan/gamma_model.stan')

# Train the model and generate samples
stan_fit = stanm.sampling(data=data, iter=1000, chains=4, warmup=500, thin=1, seed=101)

COMPILING THE C++ CODE FOR MODEL anon_model_a1534e6d883628ba3a5700d4305b8abc NOW.
n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
5 of 2000 iterations ended with a divergence (0.25 %).
Try running with adapt_delta larger than 0.8 to remove the divergences.
32 of 2000 iterations saturated the maximum tree depth of 10 (1.6 %)
Run again with max_treedepth larger than 10 to avoid saturation
Chain 1: E-BFMI = 0.0119
Chain 2: E-BFMI = 0.00533
Chain 3: E-BFMI = 0.000414
Chain 4: E-BFMI = 0.0019
E-BFMI below 0.2 indicates you may need to reparameterize your model

sm_fit = sm.GLM(y,X, family=sm.families.Gamma(link=sm.families.links.log)).fit()
print(sm_fit.summary())

Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                      y   No. Observations:                  177
Model:                            GLM   Df Residuals:                      172
Model Family:                   Gamma   Df Model:                            4
Link Function:                    log   Scale:                        0.052743
Method:                          IRLS   Log-Likelihood:                -2443.8
Date:                Mon, 29 Jun 2020   Deviance:                       7.6958
Time:                        09:08:28   Pearson chi2:                     9.07
No. Iterations:                    11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.1812      0.005     33.899      0.000       0.171       0.192
x2             0.0485      0.005      9.632      0.000       0.039       0.058
x3            -0.0600      0.006     -9.760      0.000      -0.072      -0.048
x4            -0.0218      0.008     -2.753      0.006      -0.037      -0.006
const         13.7931      0.019    719.882      0.000      13.756      13.831

Stan model:

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  real y[N]; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real phi; //the variance parameter
}
transformed parameters {
  vector[N] mu; //the expected values (linear predictor)
  vector[N] alpha; //shape parameter for the gamma distribution
  vector[N] beta; //rate parameter for the gamma distribution
  
  mu <- exp(X*betas); //using the log link 
  alpha <- mu .* mu / phi; 
  beta <- mu / phi;
}
model {  
  betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008

  for(i in 2:K)
   betas[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
  
  y ~ gamma(alpha,beta);
}
generated quantities {
 vector[N] y_rep;
 for(n in 1:N){
  y_rep[n] <- gamma_rng(alpha[n],beta[n]); //posterior draws to get posterior predictive checks
 }
}
1 Like

Hi!

As in the other thread, I’d suggest you change the parameterization to

\begin{align} y &\sim \text{Gamma}(\alpha,\beta) \\ \alpha &= \phi^{-1} \\ \beta &= \frac{\phi^{-1}}{\mu} \end{align}

and you also need a <lower=0> on this line

and a prior for phi (or inverse_phi if you follow my proposed parameterization).

The priors on the regression coefficients are fairly wide. Something like N(0,5) would probably be fine for the intercept, and I’d actually go for something like N(0,1) for the other coefficients since they are on the log-scale.

Also, it sometimes helps to center and scale your covariates (if you have not already done this). And if the covariates are correlated a QR decomposition might be helpful.

Cheers!
Max

Side note: This

can be simplified to the one-liner betas[2:] ~ cauchy(0,2.5); or betas[2:] ~ std_normal(); if you want to try the prior I proposed above.

Cheers!
Max

8 Likes

Thanks Max!

With this new parameterisation it works!

New model code:

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  real y[N]; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real<lower=0> inverse_phi; //the variance parameter
}
transformed parameters {
  vector[N] mu; //the expected values (linear predictor)
  vector[N] beta; //rate parameter for the gamma distribution
  
  mu <- exp(X*betas); //using the log link 
  beta <- rep_vector(inverse_phi, N) ./ mu;
}
model {  
  betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008

  betas[2:] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
  
  inverse_phi ~ exponential(1); // prior on inverse dispersion parameter
  y ~ gamma(inverse_phi,beta);
}
generated quantities {
  vector[N] y_rep;
  for(n in 1:N){
  y_rep[n] <- gamma_rng(inverse_phi,beta[n]); //posterior draws to get posterior predictive checks
 }
}
2 Likes

Great! :)

@Max_Mantei Just for curiosity (I never tried to tame such a beast…) but should we need a Jacobian transformation over the parameters?

Hey! I’m not quite sure if I understand correctly (I guess you mean the regression coefficients?), but a Jacobian transformation of the parameters is not necessary here.

Cheers,
Max

1 Like

I mean 21.3 Changes of Variables of the Stan user manual. But if it’s not necessary, it’s super-ok for me!

1 Like

Rule of thumb: If you put your prior on the parameters as you have defined them in the parameters block, then you are fine (in the example above betas and inverse_phi are assigned priors directly, so that’s fine).

3 Likes

I’m really curious why this reparameterization works and the original parameterization fails so badly? (I had tried the same thing as gcarto and, apparently, others, before finding this page.)

2 Likes

Hi, have you got any idea about it? I know it’s 2 years later but still there is no much about it

1 Like

First, the original post didn’t constrain phi to be > 0 so there’s tons of problems just with that. Add real<lower=0> phi. The cauchy priors probably don’t help and Gelman doesn’t recommend those anymore. Try something like normal(0, 4) that is reasonably diffuse.

An issue with the gamma distribution shows correlation with the parameters in the fisher information matrix. See Posterior estimates of rate and shape of gamma distribution are dependent.

The given parameterization from @Max_Mantei and the one in the post I linked to above give 0s for the off diagonal part of the Fisher information matrix which will help sampling.

Btw here’s what I get with the OP’s data and this slightly updated model with no additional reparameterization of the gamma distribution:

  variable    mean  median      sd     mad      q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 betas[1]  0.178   0.178  0.00419 0.00411  0.170   0.184   1.00    1818.    2392.
2 betas[2]  0.0471  0.0471 0.00378 0.00379  0.0410  0.0534  1.00    4319.    2902.
3 betas[3] -0.0693 -0.0693 0.00465 0.00472 -0.0769 -0.0615  1.00    3408.    2427.
4 betas[4] -0.0228 -0.0228 0.00454 0.00457 -0.0304 -0.0155  1.00    3950.    3139.
5 betas[5] 13.8    13.8    0.0203  0.0200  13.8    13.8     1.00    1673.    2336.
data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  vector[N] y; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real<lower=0> phi; //the variance parameter
}
transformed parameters {
  vector[N] mu = exp(X * betas); //the expected values (linear predictor)
  vector[N] alpha = mu .* mu / phi; //shape parameter for the gamma distribution
  vector[N] beta = mu / phi; //rate parameter for the gamma distribution
}
model {  
  betas ~ normal(0, 4);

  y ~ gamma(alpha,beta);
}

FYI all I opened an issue to add an alternative parameterization

2 Likes