I have yet to obtain a working model, but I want to give a final summary on this topic because my questions are starting to move shift towards being model-specific rather than a generic question about divergences/priors.
I estimated several models last week that helped me to better understand the effect of different inputs. I read some additional suggestions on zero avoiding priors (zap). As recommended in a few places, I tried a gamma prior rather than lognormal. I switched to a student t prior on my scale parameter (I realize it’s not technically a standard deviation with the gamma, but I kept the naming convention from my previous model iterations), which seems to help a bit. I also tried combining two parameters that theoretically make sense to test as a single parameter, which didn’t help.
mu_b1tt ~ normal(3,0.25);
sd1_b1tt ~ student_t(3,0,1);
b_b1tt ~ gamma(mu_b1tt, sd1_b1tt);
I tried some tighter priors, additional iterations, a higher max_treedepth, and a higher adapt_delta. See below for how this changes my posterior plots.
Normal sd priors (25% divergent transitions)
student-t sd priors and adapt_delta=0.99 (up from 0.95) (11% divergent transitions)
gamma priors (~0% (3 total) divergent transitions)