How to most efficiently reduce_sum in a hierarchical logistic model

Not sure if this helps here, but if there is redundancy in your design matrix, it could be worth checking if you can exploit sufficient statistics and use a binomial instead of a Bernoulli model.
See my last response in this thread for some more explanation: Weighted logistic regression