I am struggling with a binomial model
I am fitting the model with brms as follows
mb <- brm(
bf(y | trials(total) ~ 1
+ category + category : prop_category
+ (1 | item))
, data = data
, family = binomial
, prior = c(
prior(normal(0, 0.1), class = sd)
, prior(normal(0, 2), class = b)
, prior(normal(0, 0.01), class = Intercept)
)
, cores = 4
, chains = 4
, warmup = 1000
, iter = 4000
, control = list(adapt_delta = 0.8)
)
The data look as follows
y total item category prop_category
<dbl> <dbl> <chr> <chr> <dbl>
1 29 55 item_1 c1 0.0157
2 2 47 item_1 c2 0.0134
3 0 26 item_1 c3 0.00742
4 0 3 item_1 c4 0.000857
5 0 3371 item_1 c5 0.963
6 519 13097 item_2 c1 0.978
Where for each item, have at least 2 of 5 possible categories, and the proportion with which that category appears with that item (in the trials, and not the y outcome).
The prior prior(normal(0, 0.1), class = sd)
is there for theoretical reasons. However, the strong prior on the intercept is for purely fitting reasons. If I set a weaker prior, the chains do not mix well for the intercept, and I get very low ESS and high Rhat (again, only for the intercept).
My guess is that this is caused by some collinearity in the predictors. However, I cannot really fit a smaller model, it wouldn’t make much sense theoretically.
Here the pairs plot:
Weirdly, the estimates of the model with a weak or strong prior on the Intercept are essentially identical (except for a bit of variation on the Intercept due to the chain mixing issue).
I also tried using QR decomposition and horseshoe priors as Burker suggested in another thread, but this didn’t help.
Is there any way around this issue?
Alternatively, given that the estimates seem to be stable independently of the intercept prior, is it justifiable to use a very strong prior on the grounds that otherwise the chains fail to converge?