Divergent Transitions when Scaling Hierarchical Model

I have the following simple hierarchical model specified in stan, and I have been testing it by varying K, the number of groups in my data. When tested for K = 1 (group 1) and K = 2 (group 1, 2), fitting works fine. However, when K = 3 (using groups 1 2, 3 from my data), I encounter a variety of issues, such as divergent transitions, high Rhat, etc. What is really interesting is that when at K = 3, the distribution of the posterior for group 1 and group 2 are visually very different from the distribution of the posterior for group 3. This is especially remarkable given that that the distribution of group 1 looks the same when tested for K = 1 and K = 2.

I’m unsure of what the cause for this might be. Adding additional groups should not change the model estimation process for earlier groups, given the model that I have specified below. If there is any other information that would be helpful for me to provide, please let me know!

data {
  int<lower=0> N;
  vector[N] alpha_hat_jdy;
  vector<lower = 0>[N] sigma_alpha;
  int g[N]; //group assignment
  int<lower=0> K; //number of groups

parameters {
  vector<lower = 0>[N] alpha_jdy;
  vector<lower = 0>[K] k_my;
  vector<lower = 0>[K] theta_my;

model {
  alpha_hat_jdy ~ normal(-alpha_jdy, sigma_alpha); 
  alpha_jdy ~ gamma(k_my[g], theta_my[g]);

Hi, a few points that stand out that might contribute to your trouble:

  • the group-level parameters of your gamma distribution currently have improper priors (equal probs for all positive real values); I strongly recommend you specify at least a weakly informative prior for each
  • because group-level parameters don’t have a prior, no pooling is happening, so I’m unsure what’s hierarchical about this model; for a hierarchical model, I would expect a parameter for the overall population mean, a variance-like parameter for the variation between groups; from the latter two, construct the shape and rate parameters of your gamma distribution for group-level means.
  • the group-level means seem to be restricted to be strictly negative (negated gamma-distributed variates), whereas the data alpha_hat_jdy does not seem to be limited to negative values; for further help, it would be good to understand what you are trying to model and why this way

Hi Luc,

Thank you for your suggestions! On your last point - I want to restrict alpha_jdy to be negative because I’m trying to impose shrinkage methods on the slopes of some demand curves, and economic theory says that demand curves must be negative.

On your note about priors - how would I go about specifying a weakly informative prior for the parameters of my gamma distribution.

OK, if you expect your data (“slope curves”) to be negative, it is good practice to declare your data as such.

As for a specifying a prior for your gamma distribution parameters: simulate and see what seems reasonable to you (prior predictive check).

But more importantly, your model does not seem to be hierarchical. See my previous comment about what that would look like.

1 Like