My attempt is attached. This is a couple of orders of magnitude faster than my previous attempts, but I still have some concerns.
In my previous attempts, I used the second specification, which introduces both a matrix inversion and autocorrelation in the error structure. These are the main reasons why my previous versions were too slow to be useful.
The current attempt essentially uses the first specification, defining y_hat, the non-stochastic part, in terms of y. So no matrix inversion necessary. However, I’m not 100% confident that I’m handling the error structure correctly. The \varepsilon are supposed to be i.i.d. But the Stan implementations of an AR§ time series model use autocorrelated errors, even though the standard specification of AR§ treats them as i.i.d.
The other trick was to use sparse matrices. I’m 95% I’ve coded this correctly, but if anyone can confirm that would be helpful.
Currently this takes ~5 seconds for a single chain of 100 transitions. That’s fast enough to be useable, but I’d appreciate any advice on optimizing further, since eventually I have to run this six times on much larger datasets.