Correlated parameters in CAR model for areal-unit spatial data


Recently, I was trying to implement the exact sparse CAR model described by Max Joseph on a different data set. In the scatterplot for the spatial effects, there is a strong correlation among them, as seen in the figure,


I am wondering what causes the strong correlation between the spatial effects in the CAR model and if it is common for conditional autoregressive models. If possible, would any reparameterization for the spatial effects to get rid of the correlation?


If you have your parameter vector err\_spatial \sim CAR(\mu, \rho, \tau^2), then you’re assigning a multivariate normal probability to them; so it makes sense that those terms will be correlated, but the scatter plots look suspicious to me too.

The model is:

y \sim Gau(\mu, (I - \rho C)^{-1} M)

where C is a connectivity matrix. Its typically really sparse, meaning most entries are zero. But it gets inverted in the covariance matrix, and it becomes a dense matrix. So neighboring areas aren’t the only ones that will be correlated.

If there’s a covariate that has a similar spatial trend across the map as your outcome, then the covariate will be competing for influence with the spatial autocorrelation (SA) term (as intended) and I suppose that could induce additional correlation into the entire set of SA terms, as they expand/contract in response to the influence of the covariate.

If you’re using a model similar to what’s in that case study, where you’re modeling a rare disease or some other small rate in a log-linear Poisson model, then I might wonder why those err\_spatial terms are taking on such large values. It looks like the visible patterns in the scatter plots are dominated by the extreme values (long tails), and we can’t see what’s happening in the bulk of the distribution (where b1\_base and the spatial terms are both near zero). I don’t know if you have any other issues or concerns with the model, but some of the unusual aspects of these scatter plots appear to be a result of that long tail on b1\_base.


Thanks for the explanation.

Yes, the data has a long tail feature and also different regions tend to have different case counts up to a ten-fold difference, and also zeros are common in some areas. So for the full spatial effects, some are estimated to be extremely small and some extremely large (compared to 0). The covariate used here for simplicity is just the log of the population in different regions, which does differ too much on a log scale. But the posterior of b1_base, which is the coefficient for log population, is centered around zero.

So for the long tail distribution, taking the log transformation will be a solution to get rid of that. I am wondering if using the spatial CAR smoothing described in Hierarchical Modeling and Analysis for Spatial Data would efficiently remove some of the extreme values.

Thanks again for sharing your idea!

Ohh I see. Is there a reason your arent using log population as an offset? That way the CAR model would become your model for the log incidence rates, which is usually what were looking for.

1 Like

Actually, I am working on Spatio-temporal surveillance data structure and I am trying to incorporate the spatial effect and lag-one temporal effect into a large connectivity matrix, there aren’t other predictors available so I just chose log population as the only predictor but I could try to use population as an offset and then use maybe the time trend as a predictor.

Alright, I hope that helps.

Note that using log-population as the offset is the same as this:
y_i \sim Poi(P_i \cdot e^{\phi_i})
\phi \sim Gau(\mu, \Sigma)
where P_i is the population at risk and e^{\phi_i} is the disease risk (or ‘rate’; \phi_i is the log-risk). This way the variance of the likelihood is related to the size of the population at risk, but we’re still modeling risk (or rates, as opposed to raw counts) with the CAR model. Because

P_i \cdot e^{\phi_i} = e^{log(pop)} \cdot e^{\phi_i} = e^{log(pop) + \phi_i}.

That’s why removing the offset from the model will cause all sorts of problems. Substantively too, including population at risk (the denominator) is needed to prevents us from confusing raw counts with risk.

There may be better ways to build the time series part into that kind of model; an issue you may encounter with what you’re describing is that time series autocorrelation is usually much stronger than spatial autocorrelation. Practically this means that you might learn much more about some value by observing its own past and subsequent value than you would learn by observing values that are spatially adjacent. You’ll have one space-time autocorrelation parameter, whereas you may want one for space and another for time (as in, .e.g., CAR-VAR models)


Thanks for the detailed explanation.

I used the population as the offset term and model converged. Previously, I thought the convergence issue arose from the inclusion of too many random effects such as spatial effect and temporal effect, and then I combined them as in one connectivity matrix without considering the missing offset term issue. I still want to include different random effects separately again if the model with offset converges for a better model interpretation. Again, I appreciate the practical experience on the Spatio-temporal modeling you just shared and it broadens my perspective furthermore.

1 Like