# Divergent error in BYM2 Model

Hello Stan Community! I am new to stan and I am trying to replicate the work of Mitzi Morris on BYM2 from https://mc-stan.org/users/documentation/case-studies/icar_stan.html for a different area. However, I am getting these two errors even when I am using adapt_delta=0.97, max_treedepth=15, warmup=10000, iterations=15000:
1: There were 5000 divergent transitions after warmup. Seehttp://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmupto find out why this is a problem and how to eliminate them.
2. Examine the pairs() plot to diagnose sampling problems

I tried using non centered parameters for ‘betas’, however, it throws me 7 common errors of stan.

I am using 10 covariates and an offset. It would be great if anyone can help me understand the reason for this error and ways to fix it.

Here is a stan program for the BYM2 model retrieved from github repository of Mitzi Morris:

``````data {
int<lower=0> N;
int<lower=0> N_edges;
int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

int<lower=0> y[N];              // count outcomes
vector<lower=0>[N] E;           // exposure
int<lower=1> K;                 // num covariates
matrix[N, K] x;                 // design matrix

real<lower=0> scaling_factor; // scales the variance of the spatial effects
}
transformed data {
vector[N] log_E = log(E);
}
parameters {
real beta0;            // intercept
vector[K] betas;       // covariates

real<lower=0> sigma;        // overall standard deviation
real<lower=0, upper=1> rho; // proportion unstructured vs. spatially structured variance

vector[N] theta;       // heterogeneous effects
vector[N] phi;         // spatial effects
}
transformed parameters {
vector[N] convolved_re;
// variance of each component should be approximately equal to 1
convolved_re =  sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
}
model {
y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);  // co-variates

// This is the prior for phi! (up to proportionality)
target += -0.5 * dot_self(phi[node1] - phi[node2]);

beta0 ~ normal(0.0, 0.05);
betas ~ normal(0.0, 0.05);
theta ~ normal(0.0, 1.0);
sigma ~ normal(0, 1.0);
rho ~ beta(0.5, 0.5);
// soft sum-to-zero constraint on phi)
sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
}
generated quantities {
real logit_rho = log(rho / (1.0 - rho));
vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma; // co-variates
vector[N] mu = exp(eta);
}

``````

Here is the same model with non centered parameterization:

``````data {
int<lower=0> N;
int<lower=0> N_edges;
int<lower=1, upper=N> node1[N_edges];  // node1[i] adjacent to node2[i]
int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]

int<lower=0> y[N];              // count outcomes
vector<lower=0>[N] E;           // exposure
int<lower=1> K;                 // num covariates
matrix[N, K] x;                 // design matrix

real<lower=0> scaling_factor; // scales the variance of the spatial effects
}
transformed data {
vector[N] log_E = log(E);
}
parameters {
real beta0;            // intercept
vector[K] betas_raw;       // covariates

real<lower=0> sigma;        // overall standard deviation
real<lower=0, upper=1> rho; // proportion unstructured vs. spatially structured variance

vector[N] theta;       // heterogeneous effects
vector[N] phi;         // spatial effects
vector[K] gamma;
vector<lower=0>[K] tau;
}
transformed parameters {
vector[N] convolved_re;
vector[K] betas;
// variance of each component should be approximately equal to 1
convolved_re =  sqrt(1 - rho) * theta + sqrt(rho / scaling_factor) * phi;
betas= gamma + tau .*betas_raw;
}
model {
y ~ poisson_log(log_E + beta0 + x * betas + convolved_re * sigma);  // co-variates

// This is the prior for phi! (up to proportionality)
target += -0.5 * dot_self(phi[node1] - phi[node2]);

gamma ~ normal(0,0.05);
tau ~ cauchy(0,2.5);

beta0 ~ normal(0.0, 0.05);
betas_raw ~ normal(0.0, 0.05);
theta ~ normal(0.0, 1.0);
sigma ~ normal(0, 1.0);
rho ~ beta(0.5, 0.5);
// soft sum-to-zero constraint on phi)
sum(phi) ~ normal(0, 0.001 * N);  // equivalent to mean(phi) ~ normal(0,0.001)
}
generated quantities {
real logit_rho = log(rho / (1.0 - rho));
vector[N] eta = log_E + beta0 + x * betas + convolved_re * sigma; // co-variates
vector[N] mu = exp(eta);
}

``````
1 Like

The scale parameters on the priors in the model block might have something to do with that; have you set these to be reasonable relative to the scale of your covariates and the the outcome (rates)? I think its possible that having priors that strongly conflict with the data could cause sampling problems; so just having wider priors might address the problem.

`beta0 ~ normal(0.0, 0.05);`
` betas ~ normal(0.0, 0.05);`

1 Like

Thanks @cmcd. I did prior predictive tests and the wider priors give warnings with a lot of “NA’s”. I used the following prior:
`beta0 ~ normal(0.0, 0.01);`
`betas ~ normal(0.0, 0.01);`
The stan program works without an error now.
However, I noticed something strange in the results: only one of the 10 covariates seems to have a significant effect on outcomes. Is it normal or am I interpreting the result in wrong way?

Here is the result:
Inference for Stan model: OHbym2.
3 chains, each with iter=10000; warmup=7000; thin=1;
post-warmup draws per chain=3000, total post-warmup draws=9000.

``````           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta0      0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.02 11710    1
betas[1]  -0.02    0.00 0.01 -0.04 -0.02 -0.02 -0.01  0.00  8899    1
betas[2]  -0.02    0.00 0.01 -0.04 -0.03 -0.02 -0.02 -0.01  2670    1
betas[3]  -0.01    0.00 0.01 -0.03 -0.02 -0.01 -0.01  0.00  6477    1
betas[4]  -0.01    0.00 0.01 -0.03 -0.02 -0.01 -0.01  0.01  9762    1
betas[5]  -0.02    0.00 0.01 -0.04 -0.03 -0.02 -0.01  0.00  6665    1
betas[6]  -0.01    0.00 0.01 -0.03 -0.02 -0.01  0.00  0.01  6581    1
betas[7]  -0.01    0.00 0.01 -0.03 -0.02 -0.01  0.00  0.01  8784    1
betas[8]   0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.02 12525    1
betas[9]   0.00    0.00 0.01 -0.02 -0.01  0.00  0.00  0.01 11874    1
betas[10]  0.00    0.00 0.01 -0.02 -0.01  0.00  0.01  0.02 11866    1
sigma      3.07    0.01 0.25  2.63  2.90  3.06  3.23  3.60   638    1

Samples were drawn using NUTS(diag_e) at Wed Nov 25 12:29:18 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
``````

@rabin Not knowing anything about your data, I think that’s normal!

You have ten covariates (that’s a lot), presumably they’re not uncorrelated and that makes the inference more challenging; plus, the spatial autocorrelation in your outcome might share some spatial patterns with some of the covariates, and that can also increase your uncertainty about `betas`. Plus areal data usually has observational error (if its from a survey, there’s sampling error), and that can impact your data in all sorts of ways. So even if you have some background information suggesting that each of these might be meaningful, its not too surprising that you’d get results like that.

Stan samples faster around the standard normal that around really small values; so if you rescale `x` so that a good prior is more like `normal(0,1)` this might speed up for you. And results will be a little easier to read (more digits printed)

Also, if you’re not already, when you interpret results for this model you’ll want to print out `rho` along with `sigma`

3 Likes

Thank you very much @cmcd
I was just surprised to see the insignificant effect of some of the covariates because when I compute a simple correlation coefficient, I get a strong correlation with the outcome variable.
Here is my data:
Ohio_Counties.csv (9.8 KB)
Column Fatal (Crashes) is the outcome variable, DVMT is exposure, and the last 10 columns are covariates. I have transformed all covariates into small digits while feeding into stan.