Save fit model in pystan 2

In reading the pystan documentation, I am a little confused about how I save my fitted model. I originally followed the example here

https://pystan.readthedocs.io/en/latest/avoiding_recompilation.html

but it seems that just saves a compiled model and not fitted data. I am running a model that looks like the following

fit = model.sampling(data=data,iter=10000,warmup=8000, chains=2)

How do I go about saving this model fit for future use? Do I simply use pickle as follows?

with open(‘fit.pkl’, ‘wb’) as f:
pickle.dump(fit, f)

From my brief time using PyStan some months back, pickling the fit object seemed to work. However, when I did it, there was a warning message indicating that this was an experimental feature of PyStan, and that to one should unpickle the model associated with the pickled fit object before unpickling the fit object itself.

What I ultimately ended up doing (before switching to RStan) was to pickle certain outputs from the fit object, such as the dictionaries returned from running fit.extract() and fit.get_sampler_params().

ah thanks so much, will try that!

You need to import (unpickle) model before the fit object. So one way to do this is to save the model and the data in a dictionary. This works with python 3.6, where the dictionary is ordered. (use ordered dict or list otherwise)

import pickle
with open("model_fit.pkl", "wb") as f:
    pickle.dump({'model' : model, 'fit' : fit}, f, protocol=-1)
    # or with a list
    # pickle.dump([model, fit], f, protocol=-1)

and then later (with the same os, not cross-platform compatible)

import pickle
with open("model_fit.pkl", "rb") as f:
    data_dict = pickle.load(f)
    # or with a list
    # data_list = pickle.load(f)
fit = data_dict['fit']
# fit = data_list[1]
4 Likes

Do you have any recommendation for how to save if I wanted to open in R?

I just discovered ShinyStan and it would be great to load into R and use it.

If you’re talking about saving in Python to open in R, I don’t know if there are any great options, especially if you are aiming to use ShinyStan. You can use Python’s json module to dump dictionaries to JSON files, and then read in those files using the jsonlite R package on CRAN. That may be too limiting, though. There’s also the feather library, but that is for communicating data frames between R and Python, and may also be limiting.

However, if you’re talking about saving in one R session to open in another session, then you can simply save the fit object to a file with saveRDS() and open it with readRDS(). (There are a few functions that don’t work with saved fit objects, though. See here: https://groups.google.com/forum/#!msg/stan-users/XRqaIyh96Lo/mAb6YC4MBwAJ)

Yeah, I meant the former. Will play around with your suggestion and see if I can get it to work.

I followed this dumping and reading method, but I am facing some weird problem. After saving I can read in the same python session (be it same script, same ipython session or same jupyter notebook). In this successful reading case we get:

Stan model: anon_model_c36d319d3ed03c96dd75853df01607bc

But if Itry to read the pkl file with the same code (as given by you) in a different session (diffeter python script ot jupyter notebook) I get the following error:
ModuleNotFoundError: No module named ‘stanfit4anon_model_c36d319d3ed03c96dd75853df01607bc_7926293176758986504’

This is so confusing. On the top of that, when I try to follow the link
https://pystan.readthedocs.io/en/latest/unpickling_fit_without_model.html
I get the error:

AttributeError: module ‘pystan.experimental’ has no attribute ‘unpickle_fit’

Again I fail to understand why there are so much problems in pystan with thiese basic steps.

1 Like

Using pickle to save fits is not supported unless the model is also pickled. I’m not sure the warning at the top of the linked documentation page (“This feature is experimental and nothing is guaranteed.”) is strong enough.

This is a known issue and will be resolved in PyStan 3.

edit: add “unless the model is also pickled”

If so, why this issue has not been mentioned here (in this thread). Also, do you agree how peculiar this problem is? One can read pkl file in the same session but not in a different session. It requires some time to digest this.

What pystan version are you using?

Did you save your fit with the model? E.g. in alist where model is first item?

If you’re careful to pickle both the model and the fit, things should work. The fit depends on the model.

We welcome suggestions for changes to the documentation.

I could not find a code to check installed pystan version. However I could find from the conda list; the pystan version is 2.19.0.0. I exactly followed your code:

with open(“model_fit.pkl”, “wb”) as f:
pickle.dump({‘model’ : model, ‘fit’ : fit}, f, protocol=-1)
# or with a list
# pickle.dump([model, fit], f, protocol=-1)

So, I saved the model. I already mentioned that I could read it perfectly in the same session (be it python script or jupyter notebook). But with a different session (on the same computer/operating system) reading with the same code throws back the error. Also, inthe latter case I find that a random number is added after the correct stanfit4anon_model while reading.

But it is not working for me.

print(pystan.__version__)

What python do you use? 2.7 and <3.6 don’t always handle order “correctly” for dicts.

That extra string is not a problem.

pystan=2.19.0.0
python=3.6.9

Also checked on another computer with python 3.7 and facing the same issue!

Could you please confirm that on your computer this reading function works without any issue in a different python session? If yes, could you please let me know your pystan, python versions?

These work. (You need to have pystan installed on both times)

run_pystan.py (339 Bytes) run_pystan2.py (181 Bytes)

Tested with Python 3.7 and PyStan 2.19.1.1

1 Like

These two scripts work perfectly on my computer! But when I do exactly same (while both saving and reading) with my own fitting/sampling, the reading does not work (only when I try to read from a different session). I will try to prepare a minimal working example for this problem.
I am super confused.

Okay, after a detailed look found my problem. I was dumping model code instead of the pystan.StanModel(). The main problem was my erroneous method worked perfectly when I tested in the same jupyter notebook session and I build the whole pipeline. Now I have to change everything. A big thanks to you for all the helps.

2 Likes