Sparse NUTS: preconditioning with sparse matrix operations

That looks really cool! Making use of more structural information about the model (like conditional independence) in the sampler sounds like a great way to improve performance long term.

I usually work with pymc instead of stan, where we have a lot of that information around in some way, but I don’t think we really make use of that possibility.

Out of curiosity I ran your benchmark script, but also included the low rank mass matrix adaptation from nutpie:

I had trouble with the multithreading in sparseNUTS though, so I disabled it. I guess that means that sparseNUTS looks about 4 times less efficient in this as it actually should be?

Looks like nutpie ends up faster when the model is smaller than a few thousand parameters, and then sparseNUTS starts to take over slightly? I think that makes sense, because while the runtime doesn’t grow that much with the low rank mass matrix, the space of covariance matrices really does grow, so the structure enforced in sparseNUTS starts to really help with the estimation? I’ll have to try a sparse version of the nutpie mass matrix adaptation at some point as well… :-)

As a small aside: You said that the zero-sum-normal parametrization changes the model. This is true if you do it by just replacing the prior. It is really not difficult to do it in a way that we keep the exact same model. ( Proper use of sum_to_zero_vector in nested multilevel models - #17 by aseyboldt ). This also generalizes to models with more levels, interaction effects and correlated coefficients.

script.R (9.0 KB)