- Operating System: iOS 10.13.5
- brms Version: 2.8.0
I’ve got a data set (nrow
= 4887
) with five scaled continuous variables, one binomial response variable (0
/1
ratio of 1:15 and weighted as such with data$res_weights
), and a single random effect with seven levels (ID
s of each tracked animal). I’m trying to fit a mixed-effect logistic regression model with brms
, with each animal as a random effect. Each row is a point in space that’s roughly 5 km from each other, so spatial autocorrelation is an issue. To account for this, I attempted to include a conditional autoregressive structure with the cor_car
function in the autocor
argument, albeit the first time I did it, I had done it incorrectly.
dist.mat <- as.matrix(dist(cbind(data$x_coords, data$y_coords)))
#A distance matrix, as opposed to an adjacency matrix
rownames(dist.mat) <- data$ID
test_prior <- get_prior(y | weights(res_weights) ~ var1 + var2 + var3 + var4 + var5 + (1|ID),
family = bernoulli(link = 'logit'), data = data)
#I set the family to 'bernoulli' as initially running it as 'binomial' activated a
#prompt that bernoulli was more appropriate.
test_brms.auto <- brm(y | weights(res_weights) ~ var1 + var2 + var3 + var4 + var5 + (1|ID),
family = bernoulli(link = 'logit'), data = data, prior = test_prior,
autocor = cor_car(dist.mat, formula = ~ 1|ID), cores = 4,
control = list(adapt_delta = 0.99, max_treedepth = 100))
Converting all non-zero values in 'W' to 1
#The first red flat that I didn't notice!
Funnily enough, when I ran that, the effect of the correlation was provided in the output.
When I did it the second round, I made sure to do the adjacency matrix correctly:
dist.mat <- as.matrix(dist(cbind(data$x_coords, data$y_coords)))
W <- matrix(0, nrow = nrow(dist.mat), ncol = ncol(dist.mat))
W[W <= 5000] <- 1
#Any distances less than or equal to 5000 m is considered 'adjacent'
rownames(W) <- data$ID
test_prior <- get_prior(y | weights(res_weights) ~ var1 + var2 + var3 + var4 + var5 + (1|ID),
family = bernoulli(link = 'logit'), data = data
test_brms.auto <- brm(y | weights(res_weights) ~ var1 + var2 + var3 + var4 + var5 + (1|ID),
family = bernoulli(link = 'logit'), data = data, prior = test_prior,
autocor = cor_car(W, formula = ~ 1|ID, type = 'icar'), cores = 4,
control = list(adapt_delta = 0.99, max_treedepth = 100))
#The 'type = ' argument in the 'cor_car()' function was switched to 'icar'
#because not all points had an adjacent neighbor (i.e. there were some 0s)
#as prompted by brms. I thought was odd because I wasn't prompted to change
#this with the example from ?cor_car.
Once the model ran, though, I lost the correlation effect in the summary output. I compared this to a control model I ran without any autoregressive structures and the estimates between the two models were exactly the same, all the way down to the correlogram I made for the both of them. Running the code test_brms.auto$autocor
returns car(W, ~1|ID, ‘icar’)
, so I’m not sure why the correlation effects are not included.
So my question is, what’s happening? Does this mean the model was treated as if it didn’t have any autoregressive structure? What was it about the first time I did it (using dist.mat
where everything was a 1
) that made the correlation effects section appear, as opposed to the second time I ran it (W
where only some cells were 1
)? Does it have something to do with my adjacency matrix, or the fact I used icar
instead? There wasn’t this issue when I ran the example from the cor_car
documentation (nor did that example prompt me to use ‘icar’
instead), so surely it must be something about my data or the model syntax I used.
Here’s a summary of the relevant data set, if that might be of any help:
summary(data)
coords.x1 coords.x2 ID y var1.V1
Min. : 7442747 Min. :-1274976 animal1: 646 1: 329 Min. :-2.038741
1st Qu.: 8356859 1st Qu.: 173477 animal2: 741 2:4558 1st Qu.:-0.779301
Median : 8750482 Median : 456784 animal3:1169 Median :-0.209312
Mean : 8972824 Mean : 383244 animal4: 437 Mean : 0.000000
3rd Qu.: 9640674 3rd Qu.: 654646 animal5: 552 3rd Qu.: 0.758677
Max. :10833105 Max. : 1253649 animal6: 486 Max. : 3.307332
animal7: 856
var2.V1 var3.V1 var4.V1 var5.V1
Min. :-1.224376 Min. :-1.9800283 Min. :-0.156237 Min. :-3.274098
1st Qu.:-0.727544 1st Qu.:-0.7907740 1st Qu.:-0.111117 1st Qu.:-0.658041
Median :-0.269009 Median :-0.0735681 Median :-0.094682 Median :-0.016927
Mean : 0.000000 Mean : 0.0000000 Mean : 0.000000 Mean : 0.000000
3rd Qu.: 0.386599 3rd Qu.: 0.7250871 3rd Qu.:-0.060488 3rd Qu.: 0.642606
Max. : 4.092347 Max. : 2.6985385 Max. :23.168148 Max. : 2.608927
Update
I ran four models: A GLMM (glmer
), a brms
model without accounting for spatial autocorrelation, another brms
model with 5 km adjacency, and another brms
model with 250 km adjacency and plotted the predicted probabilities against the scaled variables:
Not much changes between the models except for the distribution of the observed predicted variables (points, though not very obvious) and the size of the confidence (GLMM, bottom)/credible (BRMS, top three) intervals.
The sizes for each of the models are also very different:
- GLMM: 0.3 mb
- BRMS (spatial autocorrelation): 1.7 mb
- BRMS (5 km adjacency): 2.5 mb
- BRMS (250 km adjacency: 49 mb
So something is definitely happening, I just don’t know what. Any ideas or thoughts?
Thanks,
-
R version:
platform x86_64-apple-darwin15.6.0 arch x86_64 os darwin15.6.0 system x86_64, darwin15.6.0 status major 3 minor 5.2 year 2018 month 12 day 20 svn rev 75870 language R version.string R version 3.5.2 (2018-12-20) nickname Eggshell Igloo