Very slow brm multilevel mixed model with spatial auto-correlation in R

Hi All,
I am trying to develop a multilevel model with spatial auto-correlation in R with “brm” package. I use below code, but It is still running for more than 10 days!! I am planning to kill the process.

Do you have any suggestions on how to speed up this process? Below the code, system and session info.
Help appreciated

library(brms) # for the analysis
library(tidyverse) # needed for data manipulation.
library(RColorBrewer) # needed for some extra colours in one of the graphs

#### Load data
> train.df<-read.csv("YEAR_2014_both.csv", header = T)
> valid.df<-read.csv("YEAR_2015_both.csv", header = T)
> test.df<-read.csv("YEAR_2016_both.csv", header = T)
> str(train.df)
'data.frame':	3108 obs. of  11 variables:
 $ FIPS               : int  1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
 $ State              : chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
 $ County             : chr  "Autauga County" "Baldwin County" "Barbour County" "Bibb County" ...
 $ Year               : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ X                  : num  872679 789778 997211 822862 863725 ...
 $ Y                  : num  1094433 884557 1033024 1141554 1255735 ...
 $ Diabetes           : num  11.4 9.3 16.5 13.5 12.5 18 15.1 14.6 13.8 10.3 ...
 $ Obesity            : num  36.3 29.4 44.5 38.5 36.1 40.1 36.2 36.3 38.3 36 ...
 $ Physical_inactivity: num  30.3 23.5 29.9 36.7 28 29.3 32.8 31.7 31.4 34.2 ...
 $ Poverty            : num  13.1 13 25.4 18.1 17.5 35.1 25 20.5 21.3 18.6 ...
 $ Education          : num  54.7 61.8 41.4 44.2 46.3 36 46.2 50.8 47.1 53.5 ...

### Get Weight Matrix

xx <- poly2nb(as(county, "Spatial"))
weighted_neighbors = nb2listw(xx, zero.policy=T)

prior_02<- c(
  prior(normal(0, 10), class = Intercept),
  prior(normal(0, 10), class = b),
  prior(cauchy(0, 10), class = sd),
  prior(cauchy(0, 10), class = sigma)
model_02 <- brm(Diabetes ~ Obesity + Physical_inactivity+ Poverty+ Education + (1|FIPS), 
		        family = gaussian(),
			autocor =  cor_errorsar(weighted_neighbors), 
		        data = train.df,
		        prior = prior_02,
		        control = list(adapt_delta=0.99),
		        warmup = 2000,
		        iter =5000,
		        thin =3,

R version 4.0.0 (2020-04-24)
[1] ‘2.13.0’
[1] ‘2.19.3’
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Total memory = 192 GB
[1] 40


Sorry, it seems your question fell through a bit. Maybe @MegBallard or @torkar have time to look into this?


@MegBallard might now spatial AC better. However, it would simplify things if you could have a reproducible example (with data) available. A few things strike me as odd,

  1. Very wide priors (have you done prior predictive checks)?
  2. adapt_delta=0.99 indicates you already have stumbled into some problems.
  3. iter=5000 could be another indication of a problem.
  4. Using thin=3 is generally not needed if you don’t have too little RAM?
  5. You are running this on one core, but you run with default four chains. So if it ran for, let’s say 16 days, using four cores, the actual running time would’ve been four days. (set chains = 1 if you want to run only one chain on cores = 1.
  6. Your weighted_neighbors is a matrix for the distance between all counties, right? That will be a large matrix so will take time to compute I guess.
  7. Have you run this with only a subset of data?
  8. Not sure if brms scales the predictors automatically(?), but if not you should scale them, e.g., train.df$Diabetes_s <- scale(train.df$Diabetes) and then use Diabetes_s as input to the method.

As I said, perhaps @MegBallard could assist you further (if she’s not on vacation :)

1 Like

Hi @zia207,

I have a guess as to what you may be encountering here.

First is that Stan is not very efficient for simultaneous auto-regressive (SAR) models. See this Geographical Analysis review paper if interested. So you’ll probably want to find a different choice of spatial model or else if you’re really into the SAR approach use a different sampler than Stan. There’s probably some custom software out there for Bayesian SAR models (I saw something in spatialreg based on LeSage and Pace’s work, which they say is really fast, but it wasn’t yet implemented).

It will help to drop the term (1|FIPS), since counties (FIPS) are the observational level and you’re using a Gaussian likelihood; I’m not sure if its possible to fit that model (similarly, can’t add an ICAR random effects term at the observational level with a Gaussian likelihood), but in any case I did that by mistake a couple times and everything got stuck. (Unless you have multiple observations per county, across years? In that case you have spatio-temporal data and may want to use a space-time model or aggregate the data; looks like that’s not the case.)

However, I’m guessing based on what you’ve shown us that the outcome variable Diabetes is a rate per 10,000 or something like that? Do you have the underlying case counts as well? If so, have you considered using a Poisson model, with population (or ‘expected case counts’) as the offset term?

Then you could use the BYM2 model from this case study, maybe its available in brms already? Its also in the geostan package (which is available and functioning but still a work in progress). I tried the BYM2 model the other day with geostan on the data from that case study (n > 2,000) and found it finished in, idk, a couple minutes. If you have reason to stick with the Gaussian model and want to use Stan, then a spatial filtering model would be an option that could work for you (also in geostan).

If you use the BYM2 model you’d just have to make sure that the spatial connectivity matrix you have is a fully connected graph; if that’s not familiar feel free to ask about it.

Hope some of that helps. Sounds like you’ve been very patient with that model.