I have a simple gamma regression model.
We are modeling score, as a function of group membership. I’ll add more covariates later, once this basic model is stable. Plotting the density of the score shows nine clear peaks.
If we summarize each group, it almost exactly matches the corresponding density peak.
Using lme4 in R with the following formula:
m <- glmer(score ~ 1 + (1|group), data=df, family=Gamma(link="log"))
(Intercept) 4.88003
$group
(Intercept)
1 -0.82205851
2 -0.62208084
3 -0.45549832
4 -0.31192307
5 -0.26049639
6 -0.17649159
7 -0.06063024
8 0.05163697
This look very good compared to the actual data. Generating random Gamma values with these coefficients, and then plotting on top of the original density plot is a VERY close fit.
However, when I attempt to replicate the same model in Stan, to better capture uncertainty, the posterior samples are vastly different. What’s really odd is that regardless of the priors I choose, the posterior is VERY VERY far away. I used values from the Frequentist and/or lme4 models as informative priors. But Stan seems to just ignore them.
data {
int<lower=0> N;
real<lower=0> score[N];
int<lower=0> N_group;
int<lower=0> group_f[N];
}
parameters {
real alpha0; // baseline intercept
vector[N_group] alpha_group;
vector[N_group] phi;
}
model {
alpha0 ~ normal(log(80), log(0.5));
alpha_group ~ normal(0,0.5);
phi ~ normal(0,1);
for(i in 1:N){
real mu_t = exp(alpha0 + alpha_group[group_f[i]]);
real alpha = square(mu_t) / phi[group_f[i]];
real beta = mu_t / phi[group_f[i]];
score[i] ~ gamma(alpha, beta);
}
}
I would have expected Stan’s posterior samples, for the coefficients, to be somewhat near the Frequentist samples.
Am I:
- Doing something wrong with my logic, or model concepts?
- Implementing this incorrectly in Stan?
Happy to share some data if anybody wants to try.
(Not sure how to attach the CSV here.)
Any help or suggestions would be greatly appreciated!