Update: Another paper came out recently that does some pretty neat stuff with the Stan compiler so it can build NNs! (cc’ing @s.maskell who sent it during the meeting today). Table 3 has some benchmarks checking numpyro against the stan examples and tl;dr we really need to update the examples in posterior db. I filed an issue about it here but it mostly comes down to we need to update these models to use some of the newer more performant code. pretty much all the models they best Stan in do a bunch of
real * vector multiplies that could pretty easily be made into matrices and use
normal_id_glm(). Though I’d be curious then if they also beat us in those because I’m pretty sure NumPyro uses the new style of matrix we are currently building out in Stan math.
Another + is I think fixing up some of those models to avoid so many loops would let them benchmark against a wider set of models. They seem to not be able to do multiple inner loops?
I think there is also a miscalculation in table 3’s performance difference calculation and it should be
(stan_time/numpyro_time) - 1. Like for
eight_schools_noncentered Stan runs in 0.02 seconds and numpyro runs in 0.07 seconds but they report a speedup of 0.29 but it’s actually a slowdown of -71%. Also they have a few places with an accuracy mismatch but sometimes they report the speedup and other times they dont?
I’m not sure where they are getting the info for
RQ2: Accuracy where they compare MCSE of the sample mean is within ± 30% of the standard deviation?* For Stan math we compare the mean down to 0.0001? I also think comparing against two separate inference algorithms you want to actually check that things like ESS and tail ESS are coming out as you would expect. I would refer to bob’s comment
* Actually looking at the cmdstan performance tests we do check that MCSE is within ±30% of the gold tests and we check that the mean’s are no more than
1*10-4 difference. Why did they only do the one? Also, at the meta level, who dives so deep down into someones code to write benchmarks and doesn’t hit them up about it???
It’s not directly relevant to their paper but I wished they posted code because it would actually be really neat to take the amount of time calculating gradients, subtract that from the overall runtime, and see how NumPyro’s NUTS impl compares to Stan. They use a non-recursive version of NUTS and I’ve always been curious how that performs against a set of problems.
I also don’t really understand figure 10 as well. Is this Stan with the compiler they wrote?
Overall though the paper is neat and if they wanted to post a design doc to integrate the numpyro stuff into Stan’s compiler I think that would be cool. I think it would be nice to break it up into two seperate things, the numpyro backend and then the NN specific blocks
Also when they are commenting about the example models
Since these are official and long-standing examples, we assume that they use the non-generative features on purpose. Comments in the source code further corroborate that the programmer knowingly used the features. While some features only occur in a minority of models, their prevalence is too high to ignore
Oooof. I’ll refer to @bob_carpenter 's comment
Those models are super old and are just made to show how you could do it. Do we need to put a disclaimer on those examples like, “Hi some of these examples are outdated but show what is possible in terms of modeling within the Stan language, for any questions please contact [SGB email to refer to someone]”? Has anyone on the Stan dev team been talking to these folks?