Pystan feature request: don’t require `data` at build time

(Nb. this is really a request for an update to stanmagic or jupyterstan to work with the latest pystan…)

I am puzzled by the requirement to associate data with the model at build time in PyStan 3. I think it is confusing from a meta-model point of view as it doesn’t really align with the way Stan works (e.g., it’s not required when running stanc). For the most part, though, this is just a conceptual issue, but doesn’t have many real-world implications.

However, I was looking into re-implementing something like stanmagic or jupyterstan, which act on a cell full of Stan code, and there isn’t an obvious way to associate data with the model code at that point (it’s possible, but would be inelegant; an alternative would be to implement a cell-magic that just implements syntax highlighting and returns a string for later use by the current stan.build()).

I admit I haven’t looked into the code to see if the build actually uses the data in a detailed way, so perhaps it’s central to the latest refactoring for v3. If not, is this something that is possible/desirable to change?

1 Like

Compiling a model and instantiating it with data are distinct steps in the HttpStan backend so it’s not like they’re inseparable. I think the reason that pystan.build wants data is that often the number of output parameters depends on the data.
The compiled model is cached even if the data is invalid so you could compile it once with empty data and keep the source around as a string for later use. This is a bit ugly, though.

try CmdStanPy - when you instantiate a CmdStanModel object, the model is compiled. data need only be supplied when you do inference using any of the inference methods.

2 Likes