Cmdstan performance evolution 2.18 - 2.23

Continuing the evaluations done in Compilation time evolution in cmdstan I ran some performance tests on our benchmark models and a few other models on Cmdstan 2.18 to 2.23.

For fast models, which finish in under 2 seconds, I ran tests with exaggerated numbers of iterations to reduce the noise and eliminate the effect of IO which was not the point here.

Some results are awesome, for some it might be worth taking a closer look. Versions labeled -sc2 are compiled with Stanc2.

SIR model:

base.stan model from @bbbales2

logistic regression (no GLM)

logistic regression with GLM (same input as above)

irt_2pl model