Simple Gaussian model: three variants with dramatically different sampling times

Any Stan program is limited in performance by its AD tree size (the thing to get the gradient). In the last example you do declare “d” which has the size of the data (n entries). You should try to avoid declaring “d” and instead only store reductions of it in variables within the model block. So try to save sum(y-mu) and sum((y-mu)^2) and then use these to get the final quantities you need. This avoids storing a full vector of size “n”. You should also consider the use of the profiling facility when doing these things.

And my “?” is for noting that I am not sure if the above will really help you. It relies on long-year Stan experience, but for performance stuff it can be surprising what one will find. The rule “avoid” declared parameters is usually a good one to get more speed (this is why the size of AD tape is being reported in the profiling outputs).

Makes more sense?

1 Like