Divergent transitions with hierarchical model

jfiksel · July 12, 2019, 3:00pm

I am fitting the following hierarchical model with Stan:

Y_{ij} \sim N(\mu_{ij}, \sigma^2) \\ \mu_{ij} = \gamma_{1i} + \beta_2 * Age_{i} + \gamma_{i2} * Age + \beta_{3} * Sex_{i} \\ \gamma_{i} \sim iid\ N_{2}(\mu_{g}, \Sigma_{g}) \\ \mu_{g} = \begin{pmatrix} \beta_{1} \\ 0 \end{pmatrix}\\ \Sigma_{g} = \begin{pmatrix} \sigma_{1g}^2 & \sigma_{12g} \\ \sigma_{12g} & \sigma_{2g}^2 \end{pmatrix}\\

With priors that you will see in my Stan code. Note that I model \gamma_{i1} and \gamma_{i1} based on the marginal and conditional distributions, which means putting a prior on the correlation.The code is as follows:

data {
    int ID[108];
    vector[108] Age;
    vector[108] dist;
    vector[108] Male;
}
parameters {
    vector[3] beta;
    matrix[27, 2] gam;
    vector<lower=0, upper = 10>[2] sigg;
    real<lower=0,upper=1> rhog;
    real<lower=0, upper = 10> sig;
}
transformed parameters {
    vector[108] mu;
    vector[27] cmg;
    vector<lower=0>[27] csd;
    for(i in 1:27) {
      cmg[i] = rhog * (sigg[1] / sigg[2]) * (gam[i,1] - beta[1]);  
      csd[i] = sqrt((sigg[2] ^ 2) * (1 - rhog^2));
    }
    for(i in 1:108) {
        mu[i] = gam[ID[i],1] + beta[2]*Age[i] + gam[ID[i],2]*Age[i] + beta[3]*Male[i];
    }
}
model {
    for(i in 1:08) {
      dist[i] ~ normal(mu[i], sig);  
    }
    for(i in 1:27) {
        gam[i, 1] ~ normal(beta[1], sigg[1]);
        gam[i, 2] ~ normal(cmg[i], csd[i]);
    }
    for(i in 1:3) {
        beta[i] ~ normal(0, 10);
    }
    for(j in 1:2) {
        sigg[j] ~ uniform(0, 10);
    }
    sig ~ uniform(0, 10);
    rhog ~ uniform(0, 1);
}

To obtain the data and fit the model in R, I use the following code:

library(rstan)
library(coda)
library(ggmcmc)

df <- read.table('http://people.oregonstate.edu/~calverta/BIDA/Chapter10/DentalData.txt',
                 header = TRUE)

data <- list(dist = df$Distance,
             Age = df$Age,
             ID = df$ID1,
             Male = df$Male)

stan_fit <- stan(file = 'Dental.stan', data = data, chains = 3,
                 control=list(adapt_delta=.99, max_treedepth = 15),
                 iter = 5000)
### Show that trace plots for sigg do not look good
### Need ggmcmc package to do this
# install.packages('ggmcmc')
stan_df <- ggs(stan_fit, family = 'sigg')
ggs_traceplot(stan_df)

Fitting this model in Stan results in divergent transition warnings. When I fit it using JAGS, I don’t have any issues. Does anyone have suggestions to help fit this model?

bbbales2 · July 15, 2019, 12:49am

That should probably be 1:108.

If you’re getting divergences, you should probably non-center these:

gam[i, 1] ~ normal(beta[1], sigg[1]);
gam[i, 2] ~ normal(cmg[i], csd[i]);

The what’s and whys of divergences and non-centering are described here: Diagnosing Biased Inference with Divergences

Stolen from another thread, ndon-centering means replacing:

parameters {
  real a;
  real mu;
  real sigma;
}

model {
  a ~ normal(mu, sigma);
}

with

parameters {
  real a_z;
  real mu;
  real sigma;
}

transformed parameters {
  real a = a_z * sigma + mu;
}

model {
  a_z ~ normal(0, 1);
}

JAGS uses different algorithms. Divergences are a product of Stan sampling with HMC. Think about divergences as helpful diagnostics rather than annoying errors, though it’s not always fun to get them :D.

Just because JAGS doesn’t give errors, doesn’t mean it isn’t having trouble sampling things.

Generally we encourage normal priors over uniform ones.

Have you looked at using a package like brms or rstanarm to do this modeling? This might be easier to code that way (https://cran.r-project.org/web/packages/brms/vignettes/brms_multilevel.pdf)

jfiksel · July 15, 2019, 1:11pm

@bbbales2 changing 08 to 108 did the trick–great catch! I completely agree with you on your other points–the reason why I want to use the same model I used with JAGS is that I’m changing models that were run in JAGS/BUGS to STAN for a textbook, so I’m trying to keep the priors and everything the same, so that the output doesn’t have to be changed in the textbook (I’m not involved with the textbook other than this).

bbbales2 · July 15, 2019, 2:05pm

Oh, okay. To avoid those errors pass in all the shapes and sizes as variables themselves. Like change:

data {
    int ID[108];
    vector[108] Age;
    vector[108] dist;
    vector[108] Male;
}

To:

data {
    int N;
    int ID[N];
    vector[N] Age;
    vector[N] dist;
    vector[N] Male;
}

Etc. Good luck!

betanalpha · July 15, 2019, 3:52pm

Just be aware that the original models might not have been fit accurately with JAGS/BUGS in the first place! One of the great benefits of Stan is that its more powerful diagnostics can identify poor fits that go unnoticed with other algorithms; Stan isn’t worse, it’s just more observant. You might have to drop models that don’t fit if you don’t want to consider modifying priors and the like.

jfiksel · July 15, 2019, 11:19pm

@betanalpha I completely agree! Unfortunately I’m only getting paid (and only have time) to translate the models into Stan, not to write the textbook. I’ve been careful to compare the posterior densities for the parameters of interest for each model from JAGS and Stan, and so far the results have been very similar.

Topic		Replies	Views
Hierarchical Model in Stan Taking to Long Modeling specification , performance	12	1236	June 19, 2019
Divergent transitions in hierarchical logit Modeling	5	757	September 30, 2019
Divergent Transitions when Scaling Hierarchical Model Modeling	3	400	October 18, 2022
Divergent transitions in a multivariate multilevel regression model Modeling fitting-issues	17	1414	August 25, 2018
Sampling from the prior - why am I seeing divergent transitions? Modeling ecology	6	2797	December 10, 2021

Divergent transitions with hierarchical model

Related topics