Very new to thinking about spatial error structures, so looking for some pathways forward.
Say you have individual-level data with individuals (indexed i=1,…,N ) nested within a geographic regions (indexed j=1,…,J) and you were to fit a random-intercept model.
y_{i(j)} = \gamma_{0} + u_j + \epsilon_{i(j)}, u_j \sim N(0,\tau)
Suppose we can’t assume that neighboring regional random effects are independent, and we want to account for the fact that these regional mean effects are correlated, i.e.,
cor(u_{j},u_{j'}) \in (-1,1)
and this correlation is modeled in some way using an adjacency matrix that specifies if j and j’ are neighboring regions.
Is there a way to specify such an error structure structure on the random effects? From what I’ve read on car
error structures, these spatial auto correlations apply to level-1, and not level-2.
Sorry this is such a beginners question, but any guidance/leads/examples would certainly be most appreciated!
Hi,
Not having used brms that much, I do believe you can do something do something like + gp(u)
to your brms model formula, where u
is a column in your dataframe indicating the region, and gp()
specifies a Gaussian process for the random effect.
I’m not sure about adjacency matrices, but for distance/proximity matrices, you can use either a pre-set spatial covariance matrix or a Gaussian process to estimate the spatial covariance matrix under the hood. This works even for level-2 spatial units.
# approach 1 uses a manually specified covariance
# matrix for the random intercepts
brm(y ~ x + (1 | gr(group, cov = covMat))
data = d, data2 = list(covMat = covMat))
# approach 2 uses latitude and longitude to estimate
# the covariance matrix under the hood (gr = TRUE
# ensures that observations at the same location are
# grouped together)
brm(y ~ x + gp(lat, lon, gr = TRUE), data = d)
Thank you, @mhollanders and @scott.claessens
I’ve specified my model with gp(lat,lng,gr=TRUE).
I just want to make sure of one thing concerning model identification. If I had ignored spatial autocorrlation, then I would be imposing a level-1 error structure as
\epsilon_{i(j)} | Region=j \sim N(0,\sigma^2 I)
When specifiying Gaussian process exp_quad
kernel that is implemented in gp
, I get parameter estimates for the following level-1 error structure parameters sigma
, sgdp
, and lscale
.
Based on how the exp_quad
kernel is defined here: Set up Gaussian process terms in brms — gp • brms, wouldn’t specifying both an sgdp
parameter and a sigma
parameter be redundant? In other words, if individuals are in the the same region, then
||x_{i(j)} -x_{i'(j)}||_{2} =0
and the corresponding within-region variance is
Var[y_{i(j)}, y_{i'(j)} | Region=j] = k(x_{i(j)} ,x_{i'(j)})=sgdp^2.
Wouldn’t we want it to be the case that
sgdp^2 = \sigma^2?
Thus making it redundant to specify both parameters?
Thanks again so much!
I’m not 100% on the math and I’m definitely not an expert on this, but I don’t think there’s redundancy. My understanding is that it’s like variance-partitioning: a portion of the variance is due to spatial autocorrelation between regions, and what’s left is the residual variance.
I just simulated this in R to check myself, and the model seems to fit without issue:
library(brms)
library(MASS)
# simulate scaled proximity matrix for 10 regions
positions <- data.frame(region = 1:10, lat = rnorm(10), lon = rnorm(10))
distMat <- as.matrix(dist(positions))
proxMat <- 1 - (distMat / max(distMat))
# simulate outcome variable with 10 observations per region (n = 100)
# autocorrelation among regions and individual-variation around region means
region <- rep(1:10, each = 10)
y <- as.numeric(scale(mvrnorm(1, rep(0, 10), proxMat)))[region] + rnorm(100)
# put together data
d <- data.frame(
region = region,
y = y,
lat = positions$lat[region],
lon = positions$lon[region]
)
# gaussian process model fits without issue
m <- brm(y ~ gp(lat, lon, gr = TRUE), data = d)