Bym and besag models in brms

That’s great news, thanks for working on this. I just gave it a try with a simple model and my R session aborts - any ideas?

Do you have any more information?

There isn’t really any more information … It starts compiling the model normally, then the R session aborted thing pops up a minute later.

What’s the model. What’s the size. What’s the version of R and brms. What’s the operating system. How much memory do you have.

Most importantly can you send a full example where it breaks so we can look at it.

We can’t do much without that.

Gotcha - R 3.6.0, brms 2.10.0, running on Windows with 16 GB RAM (and I’ve been able to handle much more complicated models than this on this system).

This is as far as the model goes before R aborts. It’s a zero inflated dataset so this example is using family=“negbinomial”, but it happens with Poisson as well. The model I actually wish to use this for is a much more complicated zero inflated neg binomial model with a number of predictors and a random intercept term - I’ve done this successfully using a GAM of the polygon centroids, but I’d like to compare that to the CAR model.

The script and the data are attached.

(Attachment data.shx is missing)

(Attachment data.prj is missing)

(Attachment data.dbf is missing)

(Attachment data.shp is missing)

bym_brms.R (439 Bytes)

Well, attachment was rejected (I’d attached a shapefile) but you can see the syntax in my R file.

I tried to construct my adjacency matrix differently, yet still get R aborted session errors. My data are in a shapefile, which I can’t load to Stan, but here is the model syntax, just a simple intercept only model.

data <- st_read( “data.shp” )

centroids_sf <- st_centroid(st_geometry(data), of_largest_polygon=TRUE)
coords_sf <- st_coordinates(centroids_sf)
neigh <- knn2nb(knearneigh(coords_sf, k=6))
w1 <- nb2mat(make.sym.nb(neigh), style=“B”)
rownames(w1) <- data$bg

model1 <- brm( bf( y ~ 1 ),
family=“negbinomial” ,
data=data ,
autocor = cor_car( W=w1 , ~1 | bg , type=“bym2” ) ,
prior=prior( normal( 0 , 1 ) , class=Intercept ) ,
chains=4 )

Well I’ve made a progress - I simulated a Poisson dataset and the model ran fine. My primary data are very zero-inflated, and I suspect R just crashed without an informative error when trying to fit the CAR model with it.

Since then I’ve constructed the adjacency matrix as above, and it now seems to run correctly:

model2 <- brm( bf( cases~ 1 ,
zi ~ 1 ),
autocor = cor_car( W=w1 , ~1 | bg , type=“bym2” ) ,
prior=prior( normal( 0 , 1 ) , class=Intercept ) ,
data=data,
family = “zero_inflated_negbinomial” ,
cores=4)

I do get this error that I’m hoping to correct just by increasing iterations, treedepth, and adapt delta:

Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
[http://mc-stan.org/misc/warnings.html#bulk-ess](http://mc-stan.org/misc/warnings.html#bulk-ess)

Thanks for the help, and for implementing these structures.

I have a question - I’m doing a multi-level model where I have individual observations aggregated by spatial units. I don’t understand how the model knows where the individuals are within the adjacency matrix, though:

model1 <- brm( cmvresult ~ race + z_mage + infantsex + z_sdi ,
data = pts ,
family = “Bernoulli” ,
prior=c( prior( normal( -5.5 , 0.5 ) , class=Intercept ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = infantsexMale ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = raceAsian ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = raceHispanic ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = raceMultiple ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = raceNativeAmerican ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = z_mage ) ,
prior( normal( 0 , 0.5 ) , class=b , coef = z_sdi ) ) ,
autocor = cor_car( W=w_zips , ~1|zipcode , type=“bym2” ) ,
chains = 4 ,
iter = 2000 ,
cores = 4 )

All the fixed effects are at the individual level - there are 96,000 individuals, aggregated into 701 zip code polygons.

The zip code adjacencies are specified in the autocor function, but how does the model know where the individuals are? Is it that the ~1|zipcode is directed to the individual data?

Thanks

I am not sure I fully understand the question I am afraid. Since each individual only occurs in a specific zipcode the nesting is implictely part of the model structure as for any other multilevel model.

Let me rephrase - the zip code adjacency structure is present in the matrix, but the matrix does not have names of the zips. So when I put ~1|zipcode in the autocor formula, that grouping identifier exists only in the individual level data but not in the adjacency matrix. So how does the model connect the two?

But the matrix is of the same size as the number of zipcodes, right?

brms checks if the (row) names I believe match, but if you see a case where
an unname W gets through with a grouping factor, please provide a reproducible
example, because this should not happen.

Yes, the matrix is 701 x 701, which is the number of zipcodes.

Now, the matrix was made from a shapefile of just zipcodes.

When I put in the ~1|zipcode term in the autocor specification, I assume that looks for the zipcode column in my data source (which is 96,000 observations), right? So that would group all the observations by zipcode, but this doesn’t really correspond to the row numbers in the shapefile of the zipcodes.

Yes, this is how the zipcode variable is looked up.

At the same time the (row) names ot the W matrix are mapped to the unique values of zipcode
so that each of the 96k observations gets its right zipcode parameter. In addition, the zipcode parameters are spatially correlated according to W.

Great, thanks - that makes sense, thanks.

Paul