Correct way to incorporate spatial autocorrelation

Hi,

I tried to account for the spatial autocorrelation in my data, but got error message. Basically, the correspond variable is species richness in each grid and predictor variables are bioclimatic variables, soil fertility and speciation rate of species in each grid. All the predictor variables have been z-transformed. I searched relating posts in this discussion web and tried few ways to incorporate the spatial weight matrix. The example data set has been uploaded, which I sampled 100 rows out from the whole data set. And here is the code I used.

    library(dplyr)
    example_dt = read.csv("100rows_example_20200829.csv", sep = ",", header = T)
    example_dt$grid_no = as.factor(example_dt$grid_no)
    # calculate a distance matrix using the coordinates
    library(geosphere)
    dist = distm(example_dt[,14:15], example_dt[,14:15], fun = distVincentyEllipsoid)
    head(dist)
    # to name the columns and rows
    rownames(dist) = example_dt$grid_no
    colnames(dist) = example_dt$grid_no
    head(dist)
    # using z-transformation to scale the predictors
    example_dt$bio1_scaled = (example_dt$bio1-mean(example_dt$bio1))/sd(example_dt$bio1)
    example_dt$bio4_scaled = (example_dt$bio4-mean(example_dt$bio4))/sd(example_dt$bio4)
    example_dt$bio11_scaled = (example_dt$bio11-mean(example_dt$bio11))/sd(example_dt$bio11)
    example_dt$bio12_scaled = (example_dt$bio12-mean(example_dt$bio12))/sd(example_dt$bio12)
    example_dt$bio15_scaled = (example_dt$bio15-mean(example_dt$bio15))/sd(example_dt$bio15)
    example_dt$cec_scaled = (example_dt$cec-mean(example_dt$cec))/sd(example_dt$cec)
    example_dt$pet_scaled = (example_dt$pet-mean(example_dt$pet))/sd(example_dt$pet)
    example_dt$spra_scaled = (example_dt$speci_rate-mean(example_dt$speci_rate))/sd(example_dt$speci_rate)
    head(example_dt)

    library(brms)
    brm0 = brm(richness ~ bio1_scaled + bio4_scaled + bio11_scaled + bio12_scaled + bio15_scaled +
                 cec_scaled + pet_scaled + spra_scaled + (1|grid_no), 
               data = example_dt, family = poisson(),
               cov_ranef = list(grid_no = dist),
               chains = 4, iter = 2000, warmup = 500, cores = 8)

##### Error: Within-group covariance matrices must be positive definite.

    brm1 = brm(richness ~ bio1_scaled + bio4_scaled + bio11_scaled + bio12_scaled + bio15_scaled +
                 cec_scaled + pet_scaled + spra_scaled + (1|grid_no), 
               data = example_dt, family = poisson(),
               autocor = cor_car(dist),
               chains = 4, iter = 2000, warmup = 500, cores = 8)

##### it ran but warning message said "Converting all non-zero values in ‘M’ to 1."

    brm2 = brm(richness ~ bio1_scaled + bio4_scaled + bio11_scaled + bio12_scaled + bio15_scaled +
                 cec_scaled + pet_scaled + spra_scaled + (1|gr(grid_no, cov = dist)), 
               data = example_dt, family = poisson(),
               data2 = list(dist = dist),
               chains = 4, iter = 2000, warmup = 500, cores = 8)

##### Error: Within-group covariance matrices must be positive definite.

    brm3 = brm(richness ~ bio1_scaled + bio4_scaled + bio11_scaled + bio12_scaled + bio15_scaled +
                 cec_scaled + pet_scaled + spra_scaled + (1|grid_no), 
               data = example_dt, family = poisson(),
               autocor = cor_car(dist),
               chains = 4, iter = 2000, warmup = 500, cores = 8)

##### it ran but warning message said "Converting all non-zero values in ‘M’ to 1."

  • Operating System: Windows 10 and the R version is 4.0.1
  • brms Version: 2.13.0

I checked that no negative value exists in the “dist” matrix. The minimum value was 0. And I tried to add an extremely small number (0.00000001) to the “dist” matrix, but the error message was still the same.
So, how should I correctly produce the spatial weight matrix and incorporate the spatial autocorrelation?

100rows_example_20200829.csv (13.8 KB)

1 Like

@cmcd

Hey @daniel_yao

I can’t tell from the sample data what the grid layout is like. How you proceed will depend on the layout, whether its a regularly spaced grid (Like a square tessellation) or you have irregularly placed plots, all with potentially different distances between neighbors.

Of the options you’re looking at here, stick with the cor_car specification for the autocorrelation and I suggest staying away from (1|gr(grid_no, cov = dist) with spatial data. It would probably get stopped at peer review (assuming someone with a spatial stats background reviews it).

Here’s a note from the brms documentation for the cor_car function (hopefully not out of date):

W Adjacency matrix of locations. All non-zero entries are treated as if the two locations are adjacent.

The model is considering these locations as neighbors (1s) or not-neighbors (0s). That may or may not be good enough for your data.

Doesn’t this mean the entire matrix is getting converted to 1s, given you’ve passed in distances? If you want to use this model you’ll have to create the connectivity matrix some other way. E.g. use the lon-lat columns to create an sf or spatialpoints class r - How to Convert data frame to spatial coordinates - Stack Overflow and then use spdep to make a spatial weights matrix based on nearest neighbors maybe.

CAR, ICAR, and eigenvector spatial filtering models can incorporate non-binary weights like distances but there hasn’t been a lot of work on that in Stan yet to my knowledge (but this looks promising Sampling and identifiability of a spatial model with second-order IGMRF prior )

3 Likes

Thanks very much for you reply.
Sorry that I didn’t explain how the grid was. The grid was in 5 arc-min of resolution of the world. And the centerLong and centerLat columns were the the longitude and latitude of center of the grid. The grids in my dataset were where these species distributed, and I randomly sampled 100 rows out from the whole dataset. Because these species didn’t distribute in the whole world, the grids in the dataset didn’t have even distance. So, it means that some grids were next to each other and clustered in the same location because these species were common in this location.
I can understand your suggestion of creating the spatial distance matrix, but not very sure about the suggestion of Sampling and identifiability of a spatial model with second-order IGMRF prior in the Stan. Does the brm() has parameters with similar functions that I could used? Because building the model in the brm() is much easier to me.

Looking at the brms documentation, it looks like the default specification of cor_car (type = escar) is expecting W to be an adjacency matrix with entries 0 and 1 if they are neighbours. I can’t see that there is a brms implementation that allows for different types of spatial weighting matrices. brms also implements an ICAR model following this case study. I would think you’d have to program your own Stan program to implement non-binary weights. You can probably adapt the code from the case study above, or the IMGRF example may be of use. You also have to be careful with specifications of weights and conditional variances, but there’s lots of documentation for these sorts of models.

I see, I thought that might have been the case. It sounds like you may have a lot of areas with zero counts, right? You may end up using a zero-inflated Poisson model or something like that.

Anyways it sounds like you can use the ICAR or CAR model once you create a connectivity matrix that’s based on adjacency (zeroes and ones), for neighboring cells, using brms.

Then be careful about you grid—if in the end you have a single grid (all connected areas) you’re good to go. If your model in then has any patches that are disconnected you’ll have issues (possibly without any warning) using the ICAR models and should only use the CAR specification. (At least I think that’s true, that option hasn’t been implemented for ICAR models @paul.buerkner )

on that difference at least for implementation see this with references: https://paul-buerkner.github.io/brms/reference/car.html

Thanks for your reply. If I cannot find solution by using the brm(), then I still have to program my own Stan program. I have to say that will be difficult to me.

Thank you for your reply.

The data set only included grid that at least had one species occurrences. So, I don’t have grid with zero counts, but lots of grid with 1 occurrence. And can I still create a connectivity matrix based on adjacency?

Besides, all the grids in the data set were not all connected because these species distribute in all tropical regions in the world and tropical regions in different continents are not connected. So, I should use the CAR specification, right?

hey @daniel_yao sorry for the delay.

How you build that spatial weights matrix should depend on your understanding of your data as well as just how patchy the observations are. You’ll want to make sure that observations that are ‘nearby’ are connected (either directly or indirectly through the observations that sit between them). Whether or not something should be considered ‘nearby’ depends on what you’re studying; its less about getting some kind of statistical detail right than about building something that incorporates the information you have. I would hesitate before ever using a distance-based specification for a grid but I wouldn’t want to try to give advice or suggestions without a lot more information. So my advice for anyone for what its worth would be to think through the implications of different specifications; if more than one really makes sense then you might want to try more than one model to see how sensitive results are.

I can’t provide any advice on this (even with more details—I don’t do ecology) but I do wonder if it would make sense to delineate a contiguous study area (or areas) for which you have covariate data and assign zero values to those that have been stripped out due to no observations. That might not make sense for your project, or it might be important to consider depending on all the details. Its just about not excluding pertinent information, and zero values can be just as relevant as non-zero values.

This book has a chapter (ch 4) on specifying spatial weights matrices,

with pdfs available online through some university libraries if you’re signed in.

For the second question, yes—definitely want to use the full CAR specification in that case (if you were building your own Stan model you could consider the ICAR.)

On that point, if you’re interested in any details you can see Max Joseph’s case study on CAR models in Stan:
https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html

[update: or see here for an advance on the CAR models in Stan, the method used in geostan R package: Building spatial conditional autoregressive (CAR) models in the Stan programming language

2 Likes

Thanks for your help. And sorry for my delay. I will think about my data again and read the book chapter you recommended.