Cannot get simple residual autocorrelation tests to work

Dear forum,

I’m struggeling with what seems to be a simple issue: I have a dataset in which there is spatial autocorrelation (SAC). I want to test for this SAC, account for this SAC, and then test again to see the model has adequately accounted for the SAC. My question is perhaps similar to: Extracting residual covariance structure in brms

I have read around and tried many methods to make this work in brms, but cannot seem to figure it out. If I understand correctly, the gp() and sar() components affect the conditional response, but not the residuals directly. So either I am not extracting conditional residuals (that include the effects of the gp() or sar()), or the covariance structures I include do not account for the spatial autocorrelation. However, similar cov structures do easily account for the same autocorrelation in packages glmmTMB and spdep.

I have read that posterior_epred() and posterior_predict() should both show conditional residuals, but perhaps they dont. I have tried different ways of extracting residuals from a brm object to see if there is a way that would incorporate the effect of an SAC term.

There is also the comment of @paul.buerkner that one can use loo() but I fail to see how I would be able to explicitely test (visually or with some test statistic) that the model adequately accounted for the spatial autocorrelation in the data. Spatial autocorrelation - #2 by paul.buerkner

I basically want a plot that shows that there is no residual spatial autocorrelation, see example code below.

Thank you for your help!

Kind regards,

Louis

Code to replicate problem:

library(sf) # for spatial operations
library(gstat) # for variogram
library(spdep) # for neighbors and Moran's I
library(brms)


## Create data
n <- 100
set.seed(1)
# spatial effect
coords <- data.frame(x = rnorm(n),
                     y = rnorm(n))
spatial_effect <- with(coords, 3*sin(x) + cos(y))
# covariates
x1 <- rnorm(n, 2000, 200)
x2 <- rpois(n, 5)
group <- sample(1:6, n, replace = T)
# Response
response <- spatial_effect + .6*scale(x1) - .4*scale(x2) + rnorm(6)[group] + rnorm(n, 0, 0.5)
# Assemble data frame
data <- data.frame(response = response, x1 = x1, x2 = x2, group = group, coords)
data_sf <- st_as_sf(data, coords = c("x", "y"))


## Run models
m_no_autocor <- brm(response ~ x1 + x2 + (1|group), data = data)

data_sf$resid <- residuals(m_no_autocor)[, "Estimate"] # get residuals

plot(data_sf["resid"], pch = 16, cex = 2) # plot residuals
plot(variogram(resid ~ 1, data = data_sf)) # semivariogram
moran.mc(data_sf$resid, listw, nsim = 100) # moran's I


# SAC correction with sar()
m_sar <- brm(response ~ x1 + x2 + sar(M, type = "lag"), data = data, data2 = list(M = nb))

data_sf$resid <- residuals(m_sar, method = "fitted")[, "Estimate"] 

plot(data_sf["resid"], pch = 16, cex = 2) # plot residuals
plot(variogram(resid ~ 1, data = data_sf)) # semivariogram
moran.mc(data_sf$resid, listw, nsim = 100) # moran's I


# SAC correction with gp()
m_gp <- brm(response ~ x1 + x2 + gp(x + y, k = 5) + (1|group), data = data)

data_sf$resid  <- residuals(m_gp, method = "posterior_epred")[, "Estimate"] # get residuals

plot(variogram(resid ~ 1, data = data_sf)) # semivariogram
moran.mc(data_sf$resid, listw, nsim = 100) # moran's I



## Other ways of getting residuals
DHARMa.helpers::dh_check_brms()
data_sf$resid  <- residuals(m_gp, method = "posterior_epred")[, "Estimate"]



## Other packages

# glmmTMB - no SAC correction
library(glmmTMB
m_glmmtmb <- glmmTMB(response ~ x1 + x2 + (1|group), data = data)
summary(m_glmmtmb)

data_sf$resid <- resid(m_glmmtmb) # get residuals

plot(data_sf["resid"], pch = 16, cex = 2) # plot residuals
plot(variogram(resid ~ 1, data = data_sf)) # semivariogram
moran.mc(resid(m_glmmtmb), listw, nsim = 100) # moran's I


# SAC correction with exponential term: exp()
data$pos <- numFactor(data$x, data$y) # make one factor level per unique combination of x-y coordinates
# dat$group <- factor(rep(1, nrow(dat))) # for unique coordinates
m_exp <- glmmTMB(response ~ x1 + x2 + exp(pos + 0 | group), data = data)
summary(m_exp)

data_sf$resid <- resid(m_exp) # get residuals

plot(data_sf["resid"], pch = 16, cex = 2) # plot residuals
plot(variogram(resid ~ 1, data = data_sf)) # semivariogram
moran.mc(resid(m_exp), listw, nsim = 100) # moran's I


## with spdep/spatialreg
library(spatialreg)
# Spatial lag model (lagged dependent variable)
m_lag <- lagsarlm(response ~ x1 + x2, data = data, listw = listw) # no random effect possible 
summary(m_lag)

data_sf$resid <- residuals(m_lag) # plot residuals
plot(data_sf["resid"], pch = 16, cex = 2)
  • Operating System: Windows 10
  • brms Version: 2.23.0

If I follow you correctly then this one may answer part of your question (though without code):

depending on what your project requires geostan may offer something for you

Thank you for the reply, @cmcd!

Exactly, while residuals(model, method = “posterior_epred”) gives you residuals without accounting for the spatial autocorrelation, I’m interested in what you call the ‘detrended’ residuals: y - mu - SA. It makes sense then why bubble plots or test-statistics like Moran’s I on residuals from brms models wouldn’t show a difference between models that account for spatial autocorrelation and those that don’t.

My aim is still pretty straightforward: show (visually) that the sar() or gp() term effectively deals with the spatial autocorrelation in the model. I don’t know of another way of showing that, than with the model residuals.

So now my question is now how to get to that SA term to calculate these detrended residuals. I’m not a statistician, so looking at these functions, I can’t really figure out how to extract this SA component. Would it be possible to only have one function that calculates these detrended residuals, regardless of whether you use a sar lag or sar error, or gaussian process term?

Thank you for your help!

Louis

It will differ based on the model; For CAR models and for SAR models of type SEM (‘error model’), the implicit spatial trend is Trend = rho * C %*% (Y - Mu) (see stan_car or stan_sar). For SAR models of type SLM (lag model), the spatial trend is rho * C * y. (this is copied from the geostan documentation) C is your spatial weights matrix and rho is your SA parameter.

If you’re comfortable working with MCMC samples to calculate quantities of interest then you may be able to use that info to create the spatial rend. Or geostan will do it for you (residuals method) if you use it to fit your models; though it doesn’t have GP/geostatistical models (its not set up for doing geostatistics, focused on areal data) but it has the others.