Spatial conditional autoregressive (CAR) term in brms with missing observations

I have identified strong spatial autocorrelation in my data using a mantel test, and am trying to implement a CAR term in the brm() call of brms() package in R to account for this. My data acquisition came from 58 sites monitored weekly for 79 weeks, resulting in 4582 possible “site-weeks” of monitoring. However, some of these sites were not accessible year round, so we only actually acquired 3533 “site-weeks” of data (3533 observations).

I began by constructing my model as such:

brm1 <- brm(y ~ x1 + x2 + car(site.dists),
               control=list(max_treedepth=15, adapt_delta=0.95),

where “site.dists” was a 58x58 matrix of the euclidean distances of each site from one another.

The following resulted:

Converting all non-zero values in 'M' to 1.
Error: Dimensions of 'M' for CAR terms must be equal to the number of observations.
In addition: Warning message:
Using CAR terms without a grouping factor is deprecated. Please use argument 'gr' even if each observation represents its own location. 

I have a few questions regarding these outputs:

  1. Why would all non-zero values in M be converted to 1? This seems like the exact opposite of what should happen, given I am interested in accounting for non-zero distances between sites.
  2. Since the number of observations was 3533, I tried creating a 3533x3533 matrix of each site’s specific distance to the others in order to correct the dimensions error, but ended up with a 97Mb matrix of over 12.4 million elements, which made processing time ridiculous for this model. Is the proper way to correct this dimension error by utilizing the grouping factor on the original 58x58 matrix?
  3. How does one implement the grouping factor? I assume this ultimately tells the model “when at site A, use this corresponding value from the site A slot of the distance matrix to account for spatial autocorrelation”. However, it’s less clear how I might actually indicate the grouping factor.

I apologize in advance if these seem to be somewhat rudimentary questions, but I’m pretty new to stan/Bayesian modeling. Any help would be greatly appreciated!

1 Like

I am no expert on this type of models, but since nobody else answered, I will give it a try.

This might be just a limitation of the current implementation, but it works only with an adjacency matrix. In fact the uses I’ve seen for CAR-type models generally seem to require having only adjacency structure while ignoring the distances. So it is likely you can generalize CAR models to take some sort of distances into account, but there was so far no reason to have brms handle this case.

However, I think that if you have actual distances in an actual 2D space, it might make more sense to use a 2D spline / 2D Gaussian process over the actual coordinates of the sites as that would take the distances into accoutn automatically (although those model have their own challenges).

Those two are related - you haven’t told brms which column in your data indexes the sites. So it assumes each row is a distinct site (which it is not). If the column site contains indices into the site.dists matrix, you should use car(site.dists, gr = site).

Also note there are multiple types of CAR models in stan (as given by the type argument to CAR) and you may want to check which of those is the most appropriate.

You should almost never need such extreme values of the parameters unless your model is deeply problematic. So if you find yourself increassing all of those parameters to avoid problems, it is usually good idea to stop and try to debug what is wrong with the model (as increasing those parameters rarely resolves the issues completely).

Best of luck with your model!

1 Like

@martinmodrak This might be just a limitation of the current implementation,

I think that’s correct. It’s not actually a limitation of the model its aiming for, just the code. There are distance-based CAR specifications.

@xprockox . …58 sites monitored weekly for 79 weeks, resulting in 4582 possible “site-weeks” of monitoring. However, some of these sites were not accessible year round, so we only actually acquired 3533 “site-weeks” of data

this is a challenge for Stan, I think. Conceptually you have a good design here, but Stan can’t (very easily) model your missing observations unless you have a continuous outcome. If you had observations at every site-time (or you could model the missing observations), you would want to use a spatio-temporal connectivity structure where, for example, each observation is connected to its neighbors as well as its own past/future (1 away) values. You’d end up with a big space-time matrix, which you would create using kronecker products. Then you would not use any ‘grouping’ variable with brms. I’m developing code for a similar model in Stan and my example uses spatio-temporal data with a few missing observations; I decided to drop those sites altogether (its just a demo so it doesn’t matter to me). That’s not what you want to do.

I think you could make it work without dropping sites (just dropping observations) by creating that space-time matrix without relying entirely on Kronecker products…I’m not sure if that would work I’d need to give it more thought and write it down, and I’m not sure if you could do it with brms, and you may want to build it in sparse matrix form (using spdep)