Divergent error in BYM2 Model

rabin · November 25, 2020, 7:38am

Hello Stan Community! I am new to stan and I am trying to replicate the work of Mitzi Morris on BYM2 from https://mc-stan.org/users/documentation/case-studies/icar_stan.html for a different area. However, I am getting these two errors even when I am using adapt_delta=0.97, max_treedepth=15, warmup=10000, iterations=15000:
1: There were 5000 divergent transitions after warmup. Seehttp://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupto find out why this is a problem and how to eliminate them.
2. Examine the pairs() plot to diagnose sampling problems

I tried using non centered parameters for ‘betas’, however, it throws me 7 common errors of stan.

I am using 10 covariates and an offset. It would be great if anyone can help me understand the reason for this error and ways to fix it.

Here is a stan program for the BYM2 model retrieved from github repository of Mitzi Morris:

data {
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

  int<lower=0> y[N];              // count outcomes
  vector<lower=0>[N] E;           // exposure
  int<lower=1> K;                 // num covariates
  matrix[N, K] x;                 // design matrix

  real<lower=0> scaling_factor; // scales the variance of the spatial effects
}
transformed data {
  vector[N] log_E = log(E);
}
parameters {
  real beta0;            // intercept
  vector[K] betas;       // covariates

  real<lower=0> sigma;        // overall standard deviation
  real<lower=0, upper=1> rho; // proportion unstructured vs. spatially structured variance

  vector[N] theta;       // heterogeneous effects
  vector[N] phi;         // spatial effects
}
transformed parameters {
  vector[N] convolved_re;
  // variance of each component should be approximately equal to 1
  convolved_re =  sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
}
model {
  y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);  // co-variates

  // This is the prior for phi! (up to proportionality)
  target += -0.5 * dot_self(phi[node1] - phi[node2]);

  beta0 ~ normal(0.0, 0.05);
  betas ~ normal(0.0, 0.05);
  theta ~ normal(0.0, 1.0);
  sigma ~ normal(0, 1.0);
  rho ~ beta(0.5, 0.5);
  // soft sum-to-zero constraint on phi)
  sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
}
generated quantities {
  real logit_rho = log(rho / (1.0 - rho));
  vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma; // co-variates
  vector[N] mu = exp(eta);
}

Here is the same model with non centered parameterization:

data {
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

  int<lower=0> y[N];              // count outcomes
  vector<lower=0>[N] E;           // exposure
  int<lower=1> K;                 // num covariates
  matrix[N, K] x;                 // design matrix

  real<lower=0> scaling_factor; // scales the variance of the spatial effects
}
transformed data {
  vector[N] log_E = log(E);
}
parameters {
  real beta0;            // intercept
  vector[K] betas_raw;       // covariates

  real<lower=0> sigma;        // overall standard deviation
  real<lower=0, upper=1> rho; // proportion unstructured vs. spatially structured variance

  vector[N] theta;       // heterogeneous effects
  vector[N] phi;         // spatial effects
  vector[K] gamma;
  vector<lower=0>[K] tau;
}
transformed parameters {
  vector[N] convolved_re;
  vector[K] betas;
  // variance of each component should be approximately equal to 1
  convolved_re =  sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
      betas= gamma + tau .*betas_raw;
    }
model {
  y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);  // co-variates

  // This is the prior for phi! (up to proportionality)
  target += -0.5 * dot_self(phi[node1] - phi[node2]);

 gamma ~ normal(0,0.05);
  tau ~ cauchy(0,2.5);


  beta0 ~ normal(0.0, 0.05);
  betas_raw ~ normal(0.0, 0.05);
  theta ~ normal(0.0, 1.0);
  sigma ~ normal(0, 1.0);
  rho ~ beta(0.5, 0.5);
  // soft sum-to-zero constraint on phi)
  sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
}
generated quantities {
  real logit_rho = log(rho / (1.0 - rho));
  vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma; // co-variates
  vector[N] mu = exp(eta);
}

cmcd · November 25, 2020, 4:26pm

The scale parameters on the priors in the model block might have something to do with that; have you set these to be reasonable relative to the scale of your covariates and the the outcome (rates)? I think its possible that having priors that strongly conflict with the data could cause sampling problems; so just having wider priors might address the problem.

beta0 ~ normal(0.0, 0.05);
betas ~ normal(0.0, 0.05);

rabin · November 25, 2020, 9:09pm

Thanks @cmcd. I did prior predictive tests and the wider priors give warnings with a lot of “NA’s”. I used the following prior:
beta0 ~ normal(0.0, 0.01);
betas ~ normal(0.0, 0.01);
The stan program works without an error now.
However, I noticed something strange in the results: only one of the 10 covariates seems to have a significant effect on outcomes. Is it normal or am I interpreting the result in wrong way?

Here is the result:
Inference for Stan model: OHbym2.
3 chains, each with iter=10000; warmup=7000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=9000.

           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta0      0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.02 11710    1
betas[1]  -0.02    0.00 0.01 -0.04 -0.02 -0.02 -0.01  0.00  8899    1
betas[2]  -0.02    0.00 0.01 -0.04 -0.03 -0.02 -0.02 -0.01  2670    1
betas[3]  -0.01    0.00 0.01 -0.03 -0.02 -0.01 -0.01  0.00  6477    1
betas[4]  -0.01    0.00 0.01 -0.03 -0.02 -0.01 -0.01  0.01  9762    1
betas[5]  -0.02    0.00 0.01 -0.04 -0.03 -0.02 -0.01  0.00  6665    1
betas[6]  -0.01    0.00 0.01 -0.03 -0.02 -0.01  0.00  0.01  6581    1
betas[7]  -0.01    0.00 0.01 -0.03 -0.02 -0.01  0.00  0.01  8784    1
betas[8]   0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.02 12525    1
betas[9]   0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.01 11874    1
betas[10]  0.00    0.00 0.01 -0.02 -0.01  0.00  0.01  0.02 11866    1
sigma      3.07    0.01 0.25  2.63  2.90  3.06  3.23  3.60   638    1

Samples were drawn using NUTS(diag_e) at Wed Nov 25 12:29:18 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

cmcd · November 26, 2020, 3:24am

@rabin Not knowing anything about your data, I think that’s normal!

You have ten covariates (that’s a lot), presumably they’re not uncorrelated and that makes the inference more challenging; plus, the spatial autocorrelation in your outcome might share some spatial patterns with some of the covariates, and that can also increase your uncertainty about betas. Plus areal data usually has observational error (if its from a survey, there’s sampling error), and that can impact your data in all sorts of ways. So even if you have some background information suggesting that each of these might be meaningful, its not too surprising that you’d get results like that.

Stan samples faster around the standard normal that around really small values; so if you rescale x so that a good prior is more like normal(0,1) this might speed up for you. And results will be a little easier to read (more digits printed)

Also, if you’re not already, when you interpret results for this model you’ll want to print out rho along with sigma

rabin · November 26, 2020, 7:30pm

Thank you very much @cmcd
I was just surprised to see the insignificant effect of some of the covariates because when I compute a simple correlation coefficient, I get a strong correlation with the outcome variable.
Here is my data:
Ohio_Counties.csv (9.8 KB)
Column Fatal (Crashes) is the outcome variable, DVMT is exposure, and the last 10 columns are covariates. I have transformed all covariates into small digits while feeding into stan.

Topic		Replies	Views
Bym2 General rstan , fitting-issues , cmdstanr , hierarchical-model , brms	26	1373	February 14, 2023
BYM2 with unstructured country random effect Modeling	39	2443	February 11, 2022
Divergent transitions in Beta model General fitting-issues , specification , performance , divergences	7	1281	February 7, 2022
BYM model : translating from WinBUGS to STAN Modeling	3	1437	October 30, 2017
Divergent transitions after warmup to be sloved Modeling rstan , techniques , fitting-issues , performance , math	9	1983	February 7, 2021

Divergent error in BYM2 Model

Related topics