I have the following simple hierarchical model specified in stan, and I have been testing it by varying K, the number of groups in my data. When tested for K = 1 (group 1) and K = 2 (group 1, 2), fitting works fine. However, when K = 3 (using groups 1 2, 3 from my data), I encounter a variety of issues, such as divergent transitions, high Rhat, etc. What is really interesting is that when at K = 3, the distribution of the posterior for group 1 and group 2 are visually very different from the distribution of the posterior for group 3. This is especially remarkable given that that the distribution of group 1 looks the same when tested for K = 1 and K = 2.
I’m unsure of what the cause for this might be. Adding additional groups should not change the model estimation process for earlier groups, given the model that I have specified below. If there is any other information that would be helpful for me to provide, please let me know!
data {
int<lower=0> N;
vector[N] alpha_hat_jdy;
vector<lower = 0>[N] sigma_alpha;
int g[N]; //group assignment
int<lower=0> K; //number of groups
}
//
parameters {
vector<lower = 0>[N] alpha_jdy;
vector<lower = 0>[K] k_my;
vector<lower = 0>[K] theta_my;
}
//
model {
alpha_hat_jdy ~ normal(-alpha_jdy, sigma_alpha);
alpha_jdy ~ gamma(k_my[g], theta_my[g]);
}
Hi, a few points that stand out that might contribute to your trouble:
the group-level parameters of your gamma distribution currently have improper priors (equal probs for all positive real values); I strongly recommend you specify at least a weakly informative prior for each
because group-level parameters don’t have a prior, no pooling is happening, so I’m unsure what’s hierarchical about this model; for a hierarchical model, I would expect a parameter for the overall population mean, a variance-like parameter for the variation between groups; from the latter two, construct the shape and rate parameters of your gamma distribution for group-level means.
the group-level means seem to be restricted to be strictly negative (negated gamma-distributed variates), whereas the data alpha_hat_jdy does not seem to be limited to negative values; for further help, it would be good to understand what you are trying to model and why this way
Thank you for your suggestions! On your last point - I want to restrict alpha_jdy to be negative because I’m trying to impose shrinkage methods on the slopes of some demand curves, and economic theory says that demand curves must be negative.
On your note about priors - how would I go about specifying a weakly informative prior for the parameters of my gamma distribution.