Why wait to get data dimensions until runtime?

Hey all,

While doing compiler development, we’re always thinking about how useful it would be to have the dimensions of various for loops and data collections in order to optimize and generate better code. The downside of that is that we would then need to recompile the model every time the data sizes changed. I was talking with someone the other day who asked when it was useful to use the same model on differently sized data, and I responded that sometimes people develop models with a subset of the data. This person then pointed out that in that phase of model development a Stan user is still iterating and recompiling a lot, so they’re recompiling typically every iteration anyway.

Does anyone have compelling use-cases for compiling models that are agnostic to data sizes?

Thanks,
Sean

1 Like

I love the idea of moving as much preprocessing as possible from run-time to compile time. However, sometimes I run simulations where I fit the same quick model 1000s of times. If the decision to optimize for data sizes can be an option, that seems like it would be best of both worlds.

All of rstanarm and similar packages.

2 Likes

Fair enough. So we need both, then, unless Stan one day becomes interpreted ;)