Pystan3, compiling a model and using with different data

I have experience using Stan in R, but now I am starting to use Stan through Python. I see that there has been a major update in PyStan3, which is the version I installed. There are two things that are annoying me in starting to use PyStan, which are different from the RStan, and so I’m asking whether there’s ways around them.

The first one, is that when I compile a model with the following line, the model seems to always be stored in a chache.

model  =  stan.build(program_code=model_code, data=stan_data)

This is annoying to me because during development I’ll usually modify and recompile the model several times, but with this default, I have to remember to run a specific line to delete the cache every time before recompiling, i.e.:

httpstan.cache.delete_model_directory(posterior.model_name)

I couldn’t find an option to prevent PyStan from automatically cacheing my model, which is something useful when I’m done tweaking the model, but not before.

The second, related thing that is a bit annoying, is that I need to pass the data when compiling the model. In my R workflow, when I was done tweaking the model I’d save a compiled version, and then I would run different datasets using the same compiled model. This seemed an efficient approach, since only one compilation would allow me to run many different analyses . It’s less clear to me why someone would want to automatically cache a compiled model with a fixed dataset.

In my current Rstan version, I did this two-step analysis with the following lines:

model  =  rstan.stan_model("./path/to/file.stan")
posteriors = rstan::sampling(model, data=stan_data)

So my two questions are: Is it possible to turn off the automatic cacheing of compiled models? Is it possible to run a compiled model with a new dataset?

Thanks!

This was, in my opinion, a controversial decision to bake the data and code together in the model build process for pystan3. Personally, I started using cmdstanpy as a result – which has a straightfoward model build and then you use the built model to sample from given data – and I’ve been pretty happy with it.

1 Like

My bad, I now noticed that if I modify the model code, it will recompile, independent of the cacheing. The problem is that, in line with my R workflow, I have a separate .stan file that I was reading into Python and giving that to the model, but apparently I must have been forgetting to re-load the file before trying to compile the model again. So, if I modify the code, the model will recompile.

I can address the first issue: If you change the model code, the model should recompile. Even a meaningless difference in whitespace should trigger a recompilation.

Depending on your use case, the second issue may not actually be a problem because the build is cached. Lets say you build the model with dataset1 and then build it again with dataset2

stan.build(model_code, data=dataset1) # this will compile the model and then create a fit object with dataset1

stan.build(model_code, data=dataset2). # As long as model_code is exactly the same as the previous run this will used the cached compiled model and will create a new fit object with dataset2

So, while it seems like you’re compiling the model over and over again, you really aren’t (unless the model code changes)

EDIT: In case it isn’t clear from the above, only the stan model is cached, not stan model/data combination.