I am struggling with a binomial model

I am fitting the model with brms as follows

```
mb <- brm(
bf(y | trials(total) ~ 1
+ category + category : prop_category
+ (1 | item))
, data = data
, family = binomial
, prior = c(
prior(normal(0, 0.1), class = sd)
, prior(normal(0, 2), class = b)
, prior(normal(0, 0.01), class = Intercept)
)
, cores = 4
, chains = 4
, warmup = 1000
, iter = 4000
, control = list(adapt_delta = 0.8)
)
```

The data look as follows

```
y total item category prop_category
<dbl> <dbl> <chr> <chr> <dbl>
1 29 55 item_1 c1 0.0157
2 2 47 item_1 c2 0.0134
3 0 26 item_1 c3 0.00742
4 0 3 item_1 c4 0.000857
5 0 3371 item_1 c5 0.963
6 519 13097 item_2 c1 0.978
```

Where for each item, have at least 2 of 5 possible categories, and the proportion with which that category appears with that item (in the trials, and not the y outcome).

The prior `prior(normal(0, 0.1), class = sd)`

is there for theoretical reasons. However, the strong prior on the intercept is for purely fitting reasons. If I set a weaker prior, the chains do not mix well for the intercept, and I get very low ESS and high Rhat (again, only for the intercept).

My guess is that this is caused by some collinearity in the predictors. However, I cannot really fit a smaller model, it wouldnâ€™t make much sense theoretically.

Here the pairs plot:

Weirdly, the estimates of the model with a weak or strong prior on the Intercept are essentially identical (except for a bit of variation on the Intercept due to the chain mixing issue).

I also tried using QR decomposition and horseshoe priors as Burker suggested in another thread, but this didnâ€™t help.

Is there any way around this issue?

Alternatively, given that the estimates seem to be stable independently of the intercept prior, is it justifiable to use a very strong prior on the grounds that otherwise the chains fail to converge?