Divergent transitions with hierarchical model

I am fitting the following hierarchical model with Stan:

Y_{ij} \sim N(\mu_{ij}, \sigma^2) \\ \mu_{ij} = \gamma_{1i} + \beta_2 * Age_{i} + \gamma_{i2} * Age + \beta_{3} * Sex_{i} \\ \gamma_{i} \sim iid\ N_{2}(\mu_{g}, \Sigma_{g}) \\ \mu_{g} = \begin{pmatrix} \beta_{1} \\ 0 \end{pmatrix}\\ \Sigma_{g} = \begin{pmatrix} \sigma_{1g}^2 & \sigma_{12g} \\ \sigma_{12g} & \sigma_{2g}^2 \end{pmatrix}\\

With priors that you will see in my Stan code. Note that I model \gamma_{i1} and \gamma_{i1} based on the marginal and conditional distributions, which means putting a prior on the correlation.The code is as follows:

data {
    int ID[108];
    vector[108] Age;
    vector[108] dist;
    vector[108] Male;
}
parameters {
    vector[3] beta;
    matrix[27, 2] gam;
    vector<lower=0, upper = 10>[2] sigg;
    real<lower=0,upper=1> rhog;
    real<lower=0, upper = 10> sig;
}
transformed parameters {
    vector[108] mu;
    vector[27] cmg;
    vector<lower=0>[27] csd;
    for(i in 1:27) {
      cmg[i] = rhog * (sigg[1] / sigg[2]) * (gam[i,1] - beta[1]);  
      csd[i] = sqrt((sigg[2] ^ 2) * (1 - rhog^2));
    }
    for(i in 1:108) {
        mu[i] = gam[ID[i],1] + beta[2]*Age[i] + gam[ID[i],2]*Age[i] + beta[3]*Male[i];
    }
}
model {
    for(i in 1:08) {
      dist[i] ~ normal(mu[i], sig);  
    }
    for(i in 1:27) {
        gam[i, 1] ~ normal(beta[1], sigg[1]);
        gam[i, 2] ~ normal(cmg[i], csd[i]);
    }
    for(i in 1:3) {
        beta[i] ~ normal(0, 10);
    }
    for(j in 1:2) {
        sigg[j] ~ uniform(0, 10);
    }
    sig ~ uniform(0, 10);
    rhog ~ uniform(0, 1);
}

To obtain the data and fit the model in R, I use the following code:

library(rstan)
library(coda)
library(ggmcmc)

df <- read.table('http://people.oregonstate.edu/~calverta/BIDA/Chapter10/DentalData.txt',
                 header = TRUE)

data <- list(dist = df$Distance,
             Age = df$Age,
             ID = df$ID1,
             Male = df$Male)

stan_fit <- stan(file = 'Dental.stan', data = data, chains = 3,
                 control=list(adapt_delta=.99, max_treedepth = 15),
                 iter = 5000)
### Show that trace plots for sigg do not look good
### Need ggmcmc package to do this
# install.packages('ggmcmc')
stan_df <- ggs(stan_fit, family = 'sigg')
ggs_traceplot(stan_df)

Fitting this model in Stan results in divergent transition warnings. When I fit it using JAGS, I don’t have any issues. Does anyone have suggestions to help fit this model?

That should probably be 1:108.

If you’re getting divergences, you should probably non-center these:

gam[i, 1] ~ normal(beta[1], sigg[1]);
gam[i, 2] ~ normal(cmg[i], csd[i]);

The what’s and whys of divergences and non-centering are described here: Diagnosing Biased Inference with Divergences

Stolen from another thread, ndon-centering means replacing:

parameters {
  real a;
  real mu;
  real sigma;
}

model {
  a ~ normal(mu, sigma);
}

with

parameters {
  real a_z;
  real mu;
  real sigma;
}

transformed parameters {
  real a = a_z * sigma + mu;
}

model {
  a_z ~ normal(0, 1);
}

JAGS uses different algorithms. Divergences are a product of Stan sampling with HMC. Think about divergences as helpful diagnostics rather than annoying errors, though it’s not always fun to get them :D.

Just because JAGS doesn’t give errors, doesn’t mean it isn’t having trouble sampling things.

Generally we encourage normal priors over uniform ones.

Have you looked at using a package like brms or rstanarm to do this modeling? This might be easier to code that way (https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf)

1 Like

@bbbales2 changing 08 to 108 did the trick–great catch! I completely agree with you on your other points–the reason why I want to use the same model I used with JAGS is that I’m changing models that were run in JAGS/BUGS to STAN for a textbook, so I’m trying to keep the priors and everything the same, so that the output doesn’t have to be changed in the textbook (I’m not involved with the textbook other than this).

Oh, okay. To avoid those errors pass in all the shapes and sizes as variables themselves. Like change:

data {
    int ID[108];
    vector[108] Age;
    vector[108] dist;
    vector[108] Male;
}

To:

data {
    int N;
    int ID[N];
    vector[N] Age;
    vector[N] dist;
    vector[N] Male;
}

Etc. Good luck!

Just be aware that the original models might not have been fit accurately with JAGS/BUGS in the first place! One of the great benefits of Stan is that its more powerful diagnostics can identify poor fits that go unnoticed with other algorithms; Stan isn’t worse, it’s just more observant. You might have to drop models that don’t fit if you don’t want to consider modifying priors and the like.

1 Like

@betanalpha I completely agree! Unfortunately I’m only getting paid (and only have time) to translate the models into Stan, not to write the textbook. I’ve been careful to compare the posterior densities for the parameters of interest for each model from JAGS and Stan, and so far the results have been very similar.