BYM2-style model for continuous data

I’m showing my ignorance here but, try as I might, I haven’t found the answer in the literature or in the forum. How do I specify a BYM2-style model (Riebler et al., 2016) to account for spatial autocorrelation when the outcome is a continuous variable?

I love how easy it is to estimate spatial conditional autoregressive structures in brms and have done so for various generalized linear models. I now want to analyze some continuous spatial data. I believe that the BYM2 is not identified for continuous data as the residual variance is estimated both by \sigma, the residual standard deviation in the normal likelihood, and by \sigma_\text{CAR} and \rho_\text{CAR}, the total standard deviation across locations and the proportion of variance from the spatially structured effect. (I provide a simulated example below which indeed shows poor convergence.) Is there a way to model spatial continuous data in a way that has the same straight-forward interpretation as the \sigma_\text{CAR} and \rho_\text{CAR} parameters in the BYM2 model?

# Load packages
library(brms)

# Set seed
set.seed(8701309)

# Generate spatial data
east <- north <- 1:10
Grid <- expand.grid(east, north)
K <- nrow(Grid)

# Set up distance and neighbourhood matrices
distance <- as.matrix(dist(Grid))
W <- array(0, c(K, K))
W[distance == 1] <- 1 	
rownames(W) <- 1:K

# Generate covariates and response data
x1 <- rnorm(K)
x2 <- rnorm(K)
phi <- rmulti_normal(
  1, mu = rep(0, K), Sigma = 0.4 * exp(-0.1 * distance)
)
mu <- x1 + x2 + phi
y <- rnorm(K, mu)
dat <- data.frame(k = 1:K, y, x1, x2)

# Fit a CAR model
fit <- brm(
  y ~ x1 + x2 + car(W, gr = k, type = "bym2"),
  data = dat, 
  data2 = list(W = W),
  family = gaussian()
)

Thank you!

1 Like

I think that maybe car(... type = "icar") might accomplish what you need?

The key issue is not that your response is continuous, but rather that with the gaussian family the residual is confounded with a random effect of observation (if you had multiple observations per cell, then there wouldn’t be an issue with bym2 even though the response is continuous). I think that the icar type in brms is just the conditional autoregressive component, with no additional variance. Thus, with one observation per cell the combination of the icar and the gaussian() family yields the BYM model. If you really need the bym2 parameterization of this model, you might be able to include a bym2 component while constraining the residual sd to be very close to zero via the prior.

1 Like

Hi @nkreimer This sounds like a good application for the CAR model.

I’m not sure its available in brms (as a likelihood for your observations). But if you’re interested in exploring this option you should be able to get results for simple models pretty quickly suing geostan.

You’d get posterior distributions for \rho (the spatial autocorrelation term) and \tau (a scale parameter:

y \sim Gau( \mu, (I - \rho C)^{-1}M
where M is a diagonal matrix with diagonal elements equal to \tau \cdot m_{i,i} (some constant terms that depend on the specification you choose).

The model has an implicit spatial trend that you can calculate (if using geostan it can be returned using the spatial method with the fitted model).

This will also allow you to use the distance-based specification if that’s what you need.

4 Likes

Thank you, @cmcd, I hadn’t come across your package. I will give it a spin!

Thank you for your thoughts! I tried that option and, while it doesn’t have the same problems, it has other issues. It performs fine in the toy example I provided but, when applied to the real data, posterior samples for the sdcar and sigma parameters are highly correlated. Both parameters have only small effective sample sizes and large rhat values. I wonder if there’s a way to reparameterize the model or find a better prior for the variances. (I am using fairly weak Half-Student-t priors right now.)

@nkreimer did you ever end up reaching a solution on this? I referenced your post here as I am encountering similar issues.