I’m new to posting on these forums, so let me know if I’m formatting things wrong or you need additional information. TLDR: After sifting through a lot of the literature on ICAR and BYM2 I am still a bit confused on which one I need.
From what I understand, ICAR is suitable when you have an aggregated measure per areal unit, but is improper in that it assumes complete spatial autocorrelation (\alpha=1), while more complex models like the BYM2 estimate both spatial smoothing and non-spatial heterogeneity.
I first tested whether there was evidence of any spatial autocorrelation using moran’s I:
data: d.car$SHDI
weights: lw
number of simulations + 1: 10000
statistic = 0.099224, observed rank = 9749, p-value = 0.0251
alternative hypothesis: greater
If I’m reading this properly, the p-value suggests there is a very low chance that the data are consistent with a random process, indicating need for a car model term. However, the statistic itself suggests that the actual level of spatial autocorrelation is low, which would contradict the fundamental assumption made by ICAR, right?
I’ve tried ICAR and gotten the model to a point where it is converging properly with no warnings / errors. My model looks something like this:
brm(data = d.car, data2 = list(W=W), family = gaussian,
SHDI ~ 1 + ED_Z + I(ED_Z^2) + PP_Z + (ED_Z + I(ED_Z^2) | REGION ) + car(W,gr=PID,type='icar'),
prior = c(
prior(normal(10,4),class=Intercept),
prior(normal(0, 2), class = b, coef=ED_Z),
prior(normal(0, 1.5), class = b,coef=IED_ZE2),
prior(normal(-1,1),class=b,coef=PP_Z),
prior(lkj(2),class=cor),
prior(exponential(3),class=sd),
prior(exponential(1),class=sdcar)
# (+ default student_t for sigma)
),
iter = 6000, warmup = 3000, chains = 4, cores = 4,
control = list(adapt_delta=0.99,max_treedepth = 25),
seed = 42)
Note that SHDI is a continuous variable and REGION is a categorical grouping variable with 3 levels. ED_Z and PP_Z are also continuous. The sampling grid itself is ~100 adjacent hexagonal polygons with one row of data each.
Given the assumptions of ICAR and my Moran’s I results, I thought I would try fitting a BYM2 CAR model instead but immediately encountered issues. After reading this post it was unclear whether BYM2 is even appropriate for my model given the gaussian family and continuous data.
Following Moran’s I, I set a prior on rhocar of beta(1,4) which improved things, but I am still having issues with divergent transitions and tuning priors on sigma and sdcar, which consistently have rhats >1.05, low bulk & tail ESS, and are extremely sensitive. Here is the pairs plot which shows some obvious issues:
Currently the exponential(1) is the best prior I have for sigma and sdcar. I’ve tried many alternatives but this seems to be an issue beyond just setting priors. Also worth noting that I’ve tried increasing adapt_delta and iterations but this does not help. All other priors are the same as in the ICAR model for now.
As per Visual MCMC diagnostics using the bayesplot package (r-project.org) I understand the “banana-like” shapes might suggest the presence of “multiplicative non-identifiabilities,” but I haven’t looked much further into this.
Apologies for the long-winded post. I suspect there is something simple that I am missing here. Thank you for your help!