Pickling and unpickling fit objects

A solution to my problem is actually presented here by @ahartikainen. But, I wanted to have a discussion on it because I want to actually understand what’s happening, and I would also like to know if there are easier workarounds. The solution provided in the link is rather involved, and I’m looking for something quick, if possible.

I write all the code on my local machine (Ubuntu on Windows through windows linux subsystem), and I run the fits on a remote server (CentOS) that’s much faster. I have stan_model.py which contains the source code for the stan model. This gets pickled to model.pkl. Then, I have fit_model.py, which loads from model.pkl and performs fit=stan_model.sampling(...) and pickles the fit object in fit.pkl. In fit_analysis.py, where I want to analyze the data, I load from model.pkl and then fit.pkl (I know the model should be unpickled first).

If I do all this on the remote server, where the model is compiled and the fit is run, then everything is good. But, if I scp the same exact files onto my local machine, and run fit_analysis.py, I get the error

ModuleNotFoundError: No module named ‘stanfit4test_case_splines_5595d841db111ede20e69aef326ccffa_2217134107965400023’

  1. Is this because pickling does not work cross platforms?

  2. Is there anyway around this, besides the one given in the link? Essentially, working locally is much nicer and I want to do my fitting somewhere else and analyze them locally.

Update
Alright, so a cheap hack is to just extract the info from the fit object and pickle those structures as data types like ndarray, instead of pickling the actual fit object. So I have my workaround, but I’m still interested to know the reason behind this behavior just so I can learn more about Stan.

1 Like

You need to import compiled model first and then the fit

Pickle this:

[model, fit]

One option is to transform your fit to arviz.InferenceData or pandas.DataFrame and save those

import arviz as az
idata = az.from_pystan(fit)
idata.to_netcdf("filename.nc")

df = fit.to_dataframe()
df.to_parquet("filename.parquet")
1 Like