Performance Graphs for New GLM Primitives


Hi all,

The new GLM primitives have been merged into the math library. To give you an idea of their performance, I’ve done some testing on my laptop (4GB memory, 2 cores). These graphs are showing the performance increase on the mean time of 30 gradient computations (in C++). In all cases, the new GLM primitives are compared to the current fastest way of writing a GLM using existing primitives. All inputs are given as parameters, except for the matrix of covariates, which is specified as data. (Although the primitives allow this to be a parameter as well, in which case the performance increase should be much larger still.)

Observe that performance is not hugely dependent on the size of the data set, but increases pretty rapidly if we have models with more parameters. (At the high end of the parameter sizes I tested, my laptop started to run out of memory.)


It would be awesome to have this as a benchmark applied to each release…


Super exciting. Thanks for redoing as line graphs and publishing here.

What you’d really want to do if we were going to try to publish this is replicate a bunch of data sets and plot averages to try to smooth things out.

Do you need help getting these into the Stan language? It should just be one simple pull request to stan-dev/stan.

P.S. Base 3 is an unusual choice—I’d have gone with 4 or 10 just so I could understand the data sizes on the X axis (not that I’m asking you to re-do these—the plots are great for what we need, which is a rough idea of speedup).


Hi Bob! Glad you’re happy with them. I did the graphs very quickly, just to get a rough idea. For a case study, I’d do something a bit nicer. I’m sure I can get them into the Stan language this week. If I get completely confused, I’ll ask. :-)