I am trying to run a GLM model in Stan for Gamma distributed data but the model is not converging.
The data are simulated data (so I know that they follow a Gamma distribution) and in fact by fitting the model as a GLM I can get the right estimates for the model parameters.
I’ve already looked at this post but couldn’t find any solution.
# Put our data in a dictionary
data = {'N': X.shape[0], 'K': X.shape[1], 'X': X, 'y': y}
# Compile the model
stanm = pystan.StanModel(file='./stan/gamma_model.stan')
# Train the model and generate samples
stan_fit = stanm.sampling(data=data, iter=1000, chains=4, warmup=500, thin=1, seed=101)
COMPILING THE C++ CODE FOR MODEL anon_model_a1534e6d883628ba3a5700d4305b8abc NOW.
n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
5 of 2000 iterations ended with a divergence (0.25 %).
Try running with adapt_delta larger than 0.8 to remove the divergences.
32 of 2000 iterations saturated the maximum tree depth of 10 (1.6 %)
Run again with max_treedepth larger than 10 to avoid saturation
Chain 1: E-BFMI = 0.0119
Chain 2: E-BFMI = 0.00533
Chain 3: E-BFMI = 0.000414
Chain 4: E-BFMI = 0.0019
E-BFMI below 0.2 indicates you may need to reparameterize your model
sm_fit = sm.GLM(y,X, family=sm.families.Gamma(link=sm.families.links.log)).fit()
print(sm_fit.summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 177
Model: GLM Df Residuals: 172
Model Family: Gamma Df Model: 4
Link Function: log Scale: 0.052743
Method: IRLS Log-Likelihood: -2443.8
Date: Mon, 29 Jun 2020 Deviance: 7.6958
Time: 09:08:28 Pearson chi2: 9.07
No. Iterations: 11
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.1812 0.005 33.899 0.000 0.171 0.192
x2 0.0485 0.005 9.632 0.000 0.039 0.058
x3 -0.0600 0.006 -9.760 0.000 -0.072 -0.048
x4 -0.0218 0.008 -2.753 0.006 -0.037 -0.006
const 13.7931 0.019 719.882 0.000 13.756 13.831
Stan model:
data {
int N; //the number of observations
int K; //the number of columns in the model matrix
real y[N]; //the response
matrix[N,K] X; //the model matrix
}
parameters {
vector[K] betas; //the regression parameters
real phi; //the variance parameter
}
transformed parameters {
vector[N] mu; //the expected values (linear predictor)
vector[N] alpha; //shape parameter for the gamma distribution
vector[N] beta; //rate parameter for the gamma distribution
mu <- exp(X*betas); //using the log link
alpha <- mu .* mu / phi;
beta <- mu / phi;
}
model {
betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008
for(i in 2:K)
betas[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
y ~ gamma(alpha,beta);
}
generated quantities {
vector[N] y_rep;
for(n in 1:N){
y_rep[n] <- gamma_rng(alpha[n],beta[n]); //posterior draws to get posterior predictive checks
}
}
and a prior for phi (or inverse_phi if you follow my proposed parameterization).
The priors on the regression coefficients are fairly wide. Something like N(0,5) would probably be fine for the intercept, and I’d actually go for something like N(0,1) for the other coefficients since they are on the log-scale.
Also, it sometimes helps to center and scale your covariates (if you have not already done this). And if the covariates are correlated a QR decomposition might be helpful.
Cheers!
Max
Side note: This
can be simplified to the one-liner betas[2:] ~ cauchy(0,2.5); or betas[2:] ~ std_normal(); if you want to try the prior I proposed above.
data {
int N; //the number of observations
int K; //the number of columns in the model matrix
real y[N]; //the response
matrix[N,K] X; //the model matrix
}
parameters {
vector[K] betas; //the regression parameters
real<lower=0> inverse_phi; //the variance parameter
}
transformed parameters {
vector[N] mu; //the expected values (linear predictor)
vector[N] beta; //rate parameter for the gamma distribution
mu <- exp(X*betas); //using the log link
beta <- rep_vector(inverse_phi, N) ./ mu;
}
model {
betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008
betas[2:] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
inverse_phi ~ exponential(1); // prior on inverse dispersion parameter
y ~ gamma(inverse_phi,beta);
}
generated quantities {
vector[N] y_rep;
for(n in 1:N){
y_rep[n] <- gamma_rng(inverse_phi,beta[n]); //posterior draws to get posterior predictive checks
}
}
Hey! I’m not quite sure if I understand correctly (I guess you mean the regression coefficients?), but a Jacobian transformation of the parameters is not necessary here.
Rule of thumb: If you put your prior on the parameters as you have defined them in the parameters block, then you are fine (in the example above betas and inverse_phi are assigned priors directly, so that’s fine).
I’m really curious why this reparameterization works and the original parameterization fails so badly? (I had tried the same thing as gcarto and, apparently, others, before finding this page.)
First, the original post didn’t constrain phi to be > 0 so there’s tons of problems just with that. Add real<lower=0> phi. The cauchy priors probably don’t help and Gelman doesn’t recommend those anymore. Try something like normal(0, 4) that is reasonably diffuse.
The given parameterization from @Max_Mantei and the one in the post I linked to above give 0s for the off diagonal part of the Fisher information matrix which will help sampling.
Btw here’s what I get with the OP’s data and this slightly updated model with no additional reparameterization of the gamma distribution:
data {
int N; //the number of observations
int K; //the number of columns in the model matrix
vector[N] y; //the response
matrix[N,K] X; //the model matrix
}
parameters {
vector[K] betas; //the regression parameters
real<lower=0> phi; //the variance parameter
}
transformed parameters {
vector[N] mu = exp(X * betas); //the expected values (linear predictor)
vector[N] alpha = mu .* mu / phi; //shape parameter for the gamma distribution
vector[N] beta = mu / phi; //rate parameter for the gamma distribution
}
model {
betas ~ normal(0, 4);
y ~ gamma(alpha,beta);
}