# Help with picking CAR model type for hierarchical model - divergent transitions, high rhats / issues with sigma and sdcar

I’m new to posting on these forums, so let me know if I’m formatting things wrong or you need additional information. TLDR: After sifting through a lot of the literature on ICAR and BYM2 I am still a bit confused on which one I need.

From what I understand, ICAR is suitable when you have an aggregated measure per areal unit, but is improper in that it assumes complete spatial autocorrelation (\alpha=1), while more complex models like the BYM2 estimate both spatial smoothing and non-spatial heterogeneity.

I first tested whether there was evidence of any spatial autocorrelation using moran’s I:

data:  d.car\$SHDI
weights: lw
number of simulations + 1: 10000

statistic = 0.099224, observed rank = 9749, p-value = 0.0251
alternative hypothesis: greater


If I’m reading this properly, the p-value suggests there is a very low chance that the data are consistent with a random process, indicating need for a car model term. However, the statistic itself suggests that the actual level of spatial autocorrelation is low, which would contradict the fundamental assumption made by ICAR, right?

I’ve tried ICAR and gotten the model to a point where it is converging properly with no warnings / errors. My model looks something like this:

brm(data = d.car, data2 = list(W=W), family = gaussian,
SHDI ~ 1 + ED_Z + I(ED_Z^2) + PP_Z + (ED_Z + I(ED_Z^2) | REGION ) + car(W,gr=PID,type='icar'),
prior = c(
prior(normal(10,4),class=Intercept),
prior(normal(0, 2), class = b, coef=ED_Z),
prior(normal(0, 1.5), class = b,coef=IED_ZE2),
prior(normal(-1,1),class=b,coef=PP_Z),
prior(lkj(2),class=cor),
prior(exponential(3),class=sd),
prior(exponential(1),class=sdcar)
# (+ default student_t for sigma)
),
iter = 6000, warmup = 3000, chains = 4, cores = 4,
seed = 42)


Note that SHDI is a continuous variable and REGION is a categorical grouping variable with 3 levels. ED_Z and PP_Z are also continuous. The sampling grid itself is ~100 adjacent hexagonal polygons with one row of data each.

Given the assumptions of ICAR and my Moran’s I results, I thought I would try fitting a BYM2 CAR model instead but immediately encountered issues. After reading this post it was unclear whether BYM2 is even appropriate for my model given the gaussian family and continuous data.

Following Moran’s I, I set a prior on rhocar of beta(1,4) which improved things, but I am still having issues with divergent transitions and tuning priors on sigma and sdcar, which consistently have rhats >1.05, low bulk & tail ESS, and are extremely sensitive. Here is the pairs plot which shows some obvious issues:

Currently the exponential(1) is the best prior I have for sigma and sdcar. I’ve tried many alternatives but this seems to be an issue beyond just setting priors. Also worth noting that I’ve tried increasing adapt_delta and iterations but this does not help. All other priors are the same as in the ICAR model for now.

As per Visual MCMC diagnostics using the bayesplot package (r-project.org) I understand the “banana-like” shapes might suggest the presence of “multiplicative non-identifiabilities,” but I haven’t looked much further into this.

Apologies for the long-winded post. I suspect there is something simple that I am missing here. Thank you for your help!

2 Likes

First @mwalks, this is a very well written question.

### The model

I suggest you try the SAR model instead of the ICAR model: sar: Spatial simultaneous autoregressive (SAR) structures in brms: Bayesian Regression Models using 'Stan'

I’d also suggest the proper CAR model over the other options you’ve tried, but I’m not aware if it’s properly implemented in brms for your purpose.

### The ICAR model

The ICAR model will create a smooth spatial trend. It’s possible that in your case it has produced a very smooth spatial trend that has a small variance (scale), such that it accounts just for that low level of spatial autocorrelation (I think! See if that might be right by examining residuals.)

In disease mapping (and similar applications) where each observation is a collection (like deaths per population), the ICAR trend is usually not sufficient—because you want to model a rate for each area. So if you were to plot raw rates against fitted values after using the ICAR model for disease mapping, you’d find that the fitted values are often conspicuously far from the raw rates. There will be lots of variance ‘unaccounted for.’ That’s what the BYM model tries to account for. In your case, its only going to cause problems.