Spatial autocorrelation when spatial units are at level-2 (rather than traditional level-1)

marcus-waldman · May 22, 2024, 9:03pm

Very new to thinking about spatial error structures, so looking for some pathways forward.

Say you have individual-level data with individuals (indexed i=1,…,N ) nested within a geographic regions (indexed j=1,…,J) and you were to fit a random-intercept model.

y_{i(j)} = \gamma_{0} + u_j + \epsilon_{i(j)}, u_j \sim N(0,\tau)

Suppose we can’t assume that neighboring regional random effects are independent, and we want to account for the fact that these regional mean effects are correlated, i.e.,

cor(u_{j},u_{j'}) \in (-1,1)

and this correlation is modeled in some way using an adjacency matrix that specifies if j and j’ are neighboring regions.

Is there a way to specify such an error structure structure on the random effects? From what I’ve read on car error structures, these spatial auto correlations apply to level-1, and not level-2.

Sorry this is such a beginners question, but any guidance/leads/examples would certainly be most appreciated!

mhollanders · May 22, 2024, 11:31pm

Hi,

Not having used brms that much, I do believe you can do something do something like + gp(u) to your brms model formula, where u is a column in your dataframe indicating the region, and gp() specifies a Gaussian process for the random effect.

scott.claessens · May 23, 2024, 9:08am

I’m not sure about adjacency matrices, but for distance/proximity matrices, you can use either a pre-set spatial covariance matrix or a Gaussian process to estimate the spatial covariance matrix under the hood. This works even for level-2 spatial units.

# approach 1 uses a manually specified covariance
# matrix for the random intercepts
brm(y ~ x + (1 | gr(group, cov = covMat))
    data = d, data2 = list(covMat = covMat))

# approach 2 uses latitude and longitude to estimate
# the covariance matrix under the hood (gr = TRUE
# ensures that observations at the same location are
# grouped together)
brm(y ~ x + gp(lat, lon, gr = TRUE), data = d)

marcus-waldman · May 23, 2024, 3:49pm

Thank you, @mhollanders and @scott.claessens

I’ve specified my model with gp(lat,lng,gr=TRUE). I just want to make sure of one thing concerning model identification. If I had ignored spatial autocorrlation, then I would be imposing a level-1 error structure as

\epsilon_{i(j)} | Region=j \sim N(0,\sigma^2 I)

When specifiying Gaussian process exp_quad kernel that is implemented in gp, I get parameter estimates for the following level-1 error structure parameters sigma, sgdp, and lscale.

Based on how the exp_quad kernel is defined here: Set up Gaussian process terms in brms — gp • brms, wouldn’t specifying both an sgdp parameter and a sigma parameter be redundant? In other words, if individuals are in the the same region, then

||x_{i(j)} -x_{i'(j)}||_{2} =0

and the corresponding within-region variance is

Var[y_{i(j)}, y_{i'(j)} | Region=j] = k(x_{i(j)} ,x_{i'(j)})=sgdp^2.

Wouldn’t we want it to be the case that

sgdp^2 = \sigma^2?

Thus making it redundant to specify both parameters?

Thanks again so much!

scott.claessens · May 23, 2024, 4:19pm

I’m not 100% on the math and I’m definitely not an expert on this, but I don’t think there’s redundancy. My understanding is that it’s like variance-partitioning: a portion of the variance is due to spatial autocorrelation between regions, and what’s left is the residual variance.

I just simulated this in R to check myself, and the model seems to fit without issue:

library(brms)
library(MASS)

# simulate scaled proximity matrix for 10 regions
positions <- data.frame(region = 1:10, lat = rnorm(10), lon = rnorm(10))
distMat <- as.matrix(dist(positions))
proxMat <- 1 - (distMat / max(distMat))

# simulate outcome variable with 10 observations per region (n = 100)
# autocorrelation among regions and individual-variation around region means
region <- rep(1:10, each = 10)
y <- as.numeric(scale(mvrnorm(1, rep(0, 10), proxMat)))[region] + rnorm(100)

# put together data
d <- data.frame(
  region = region,
  y = y,
  lat = positions$lat[region],
  lon = positions$lon[region]
  )

# gaussian process model fits without issue
m <- brm(y ~ gp(lat, lon, gr = TRUE), data = d)

Topic		Replies	Views
Specifying multi membership on level 3 in model with errorsar structure brms	3	1064	June 15, 2019
Spatial simultaneous autoregressive (SAR) structures in multilevel models brms	2	944	January 9, 2019
Incorporating spatial autocorrelation in Zero-Inflated Beta regression with a temporal structure Modeling fitting-issues , specification , cmdstanr , hierarchical-model , brms	7	800	July 26, 2023
Mixed effects model accounting for spatial lag: GPS points are at the second level Modeling hierarchical-model , brms	0	135	April 27, 2024
Spatial autocorrelation with categorical response variable brms	4	714	June 28, 2020

Spatial autocorrelation when spatial units are at level-2 (rather than traditional level-1)

Related topics