I had a busy time recently as I was hoping to get a parallel ODE Stan version up and running on a real project quickly to see what it really buys me on a real-world problem (yes, those PKPD problems are just perfect for a "real-world" aka complicated benchmark).
I did take a model I quite care about and sub-sampled 160 units from the data set. The problem was run on 1 to 16 cores on our cluster. While the scaling with CPUs is not anymore linear, the practical difference of 3.3h for one core compared to just 20 minutes running time on 16 cores is huge.
Ok, now its on me start that design wiki page, but I thought sharing this news is worthwhile. Ah, and yes, I checked results for correctness, of course; all good.
ode_parallel_benchmark.pdf (5.8 KB)