In order to evaluate a particular log density that I’m interested in, I have to construct Kronecker product matrices. I know the tricks that allow one to avoid actually doing this, but it is an unusual situation and they don’t apply. I’ve written my own Kronecker product function in Stan. If I just look at the computational complexity of the operations required to evaluate the log density, the Kronecker products are not the most expensive part. I take a Cholesky decomposition which will be O(p^3) whereas the Kronecker products will be at most O(p^2), where p is the dimension having the greatest influence on the computational burden. However, I know that Stan has to evaluate gradients, as well. For the Cholesky product, the gradients are built in and thus computed relatively efficiently, whereas for the Kronecker product function I’ve written, they aren’t built in and may well be computed very inefficiently. How reasonable is it to think that computing gradients through my homemade Kronecker function might be slowing me down a lot relative to how things would be if there were a built in Kronecker function?

I know that I should profile my code and see that it is in fact the Kronecker products that are taking up a lot of time. I’m working on figuring out how to use the profiler.

There is always a trick.

Reasonable, but it is still probably negligible to the leapfrogging. It sounds as if you have a complicated model, so it may be doing 2^9 or so gradient evaluations per MCMC iteration, which is a bigger deal than whether evaluating the likelihood is O(p^2) vs. O(p^3) for reasonable values of p.

In this case, computing relatively small Kronecker products IS the trick, as far as I can tell. But you’ll have to take my word on that.

I checked the number of leapfrog steps and NUTS seems to have settled on 15 when p=100.

We’ll have to because there wasn’t any code included with your question!

You might want to read the chapter on efficiency in the Stan manual.