Wow, that’s an interesting system!
Based on this block:
for (k in 1:K) {
aj[k] ~ normal(a_mean, a_sd);
}
a_mean ~ normal(30, 15);
a_sd ~ normal(0, 10);
I think this corresponds to centered parameterization which notoriously causes divergences. Case study here is a very accessible description of why (Diagnosing Biased Inference with Divergences), and shows that you could recode the hierarchical component here using a non-centered parameterization.
To follow @andre.pfeuffer comments regarding the beta-binomial, I believe that a clever implementation of hierarchical terms could be an alternate approach to add observation-level variability, see here Beta-Binomial: Why not a default family in the package? - #2 by paul.buerkner.