I have found that using a non-centered parameterization on the global and variable regression coefficients (beta and lambda respectively) drastically improves the performance of the model, cutting down sampling time by more than half of what it was with the centered parameterization. Moreover, E-BFMI, rhat and ess_bulk diagnostics are now in the acceptable range for all models, i.e. with or without using monotonic transforms on predictors. However, there are still some divergent transitions remaining.
Divergent transitions seem to arise specifically for really small values of tau_p like tau_p < 0.01). This makes sense since low values of this scaling parameter constrain the prior density of the cutpoints kappa[j]. I could constrain tau_p to values higher than a specific limit (like by setting real<lower 0.01> tau_p;) but that would bias inference. Am I missing something or is this a normal consequence of the dirichlet population model?