Lendable, the firm I work for, is currently scoping some work that will involve fitting models to much larger datasets than we are currently. As a part of this, I’d like to put together a public white paper listing best practices/model design ideas to take into account when building production-scale models for transaction-level data.
If anyone is keen, a couple of us are going to head to lunch tomorrow after the dev meeting to discuss these things. I’ll take notes and post them here afterwards. Anyone else interested should feel free to join.
Which function scale poorly/scale well? For instance, which likelihoods use an analytical derivative etc. This is probably information available elsewhere.
Modeling techniques to avoid when modeling large datasets. Observation-level transformed parameters, for example, eat a lot of memory.
Wish I could’ve joined! Let me know how it goes. Also happy to chat on another time.
re:when VB works. I think the short summary is that it’s open research, and it depends on the specific VI method, in the same way that the set of problems Monte Carlo methods work on depend on the specific method.
re:GMO timeline. Whenever I get to finishing the experiments and Andrew and Aki greenlight putting the paper on arxiv. (Hard to put a specific time.)