Why wait to get data dimensions until runtime?

seantalts · September 10, 2019, 5:52pm

Hey all,

While doing compiler development, we’re always thinking about how useful it would be to have the dimensions of various for loops and data collections in order to optimize and generate better code. The downside of that is that we would then need to recompile the model every time the data sizes changed. I was talking with someone the other day who asked when it was useful to use the same model on differently sized data, and I responded that sometimes people develop models with a subset of the data. This person then pointed out that in that phase of model development a Stan user is still iterating and recompiling a lot, so they’re recompiling typically every iteration anyway.

Does anyone have compelling use-cases for compiling models that are agnostic to data sizes?

Thanks,
Sean

Joshua_Pritikin · September 10, 2019, 6:05pm

I love the idea of moving as much preprocessing as possible from run-time to compile time. However, sometimes I run simulations where I fit the same quick model 1000s of times. If the decision to optimize for data sizes can be an option, that seems like it would be best of both worlds.

bgoodri · September 10, 2019, 6:19pm

All of rstanarm and similar packages.

seantalts · September 10, 2019, 6:51pm

Fair enough. So we need both, then, unless Stan one day becomes interpreted ;)

Topic		Replies	Views
Compilation time depends on data set size ? - Is this a typo in this tutorial? General	2	561	January 21, 2019
Can a compiled program accept data arguments of varying length between runs? General	2	345	February 18, 2023
Why is it so slow for stan to compile model? Modeling	7	5521	January 3, 2020
Writing efficient and general stan code for R packages General	4	61	November 18, 2024
Upgrading the models used for performance testing Developers performance	15	919	June 5, 2020

Why wait to get data dimensions until runtime?

Related topics