Try to find a better way to speed up a hierarchical logistic model and reduce_sum only helped a little

Presumably this is something you need to run many times for many datasets?

Take a look at the tricks I used here; you’ll have to add the group structure but otherwise I think the reduced-redundant-computation and sufficient-stats tricks should give you substantial speed ups.