I need to perform exact bootstrap validation of my brms categorical model, with 100 bootstrap iterations. The idea is to re-fit the model to 100 bootstrap samples of the data and calculate discrimination accuracy (weighted average of submodel-specific ROC AUCs) for each bootstrap model.
The reason for doing this rather than using something like elpd_loo is that elpd_loo, while quicker to compute, lacks the clear interpretation of ROC AUC. Anyway, I want to be able to reject bootstrap models with divergent transitions and to refit them with a higher adapt_delta until there are no divergences.
To do this, I need to be able to extract the following information from each brms model:
- Whether the number of divergent transitions exceeded zero, and
- What the adapt_delta setting was.
Can this be done?
Well, for the first question I guess it’s easy:
foo <- get_sampler_params(M$fit) stores params in a list. There you’ll find the column
But for #2 I honestly don’t know. You could set it to 0.8 and if there’s a problem, increase to 0.95. But I would strongly advise you to not do this since it’s an indication that the sampler struggles so simply increasing
adapt_delta is not a solution in itself.
Thanks for the first answer.
As for the second, I don’t see why increasing adapt_delta to get rid of divergences would be wrong. That is precisely what the warning message about divergent transitions says to do. If it cannot be done, the implication would be that the model shouldn’t be fit at all, which is not helpful.
It’s a point of debate: Text for warning message
I was talking to someone in a similar situation (they were wondering about iteratively increasing
adapt_delta). For them it turned out just running with a higher adapt delta wasn’t much slower than the base adapt delta.
Following up on what @torkar said, there’s a few things you can do to try to figure out what is causing the divergences.
Fit with a smaller amount of data so you can iterate faster. It sounds like you’re doing that already though and you’re getting divergences. Did you get divergences with the full dataset?
Figure out a set of data that gives you divergences. Try tightening your priors with this model or simplifying bits of the model until the divergences go away. If you can find the part of the model that is causing the divergences, maybe there’s a way to fix it.
Simulate small datasets from your model and see if you get divergences fitting it. When you simulate data from your model, just use estimated parameters from a previous fit so you’re in the ballpark of where you think you need to be.
Just to be clear: I’m not morally opposed to increasing adapt_detla. I just think that divergences can be a signal that we can include stronger prior information or use better inits. I think it might lead to trouble if people just do adapt_delta=0.99 out of a feeling that this is the safe option. If the model has problems, the safe option is to improve it (which could involve using more realistic priors).
The model is weakly identifiable because my analysis is exploratory rather than confirmatory, involving a large number of covariates with significant multicollinearity. Using strong priors might obscure potentially interesting effects whose estimates are uncertain e.g. due to multicollinearity but still worth pointing out for future studies.
Also, Bayesian modeling is alien to most of my audience, so I want to keep the analysis as similar as possible to an equivalent frequentist one. This entails a minimal role for priors. Their only function is to prevent double-digit logits caused by complete separation and to keep the group-level SDs in the same ballpark as we would get with lme4 – hence normal(0,4) and exponential(2), respectively.
Fortunately, preliminary testing suggests that iteratively increasing adapt_delta in cases of divergent transitions is successful at eliminating the divergences.