Combine Fit Objects

Is it possible to combine multiple StanFit4Model objects? I’ve been struggling with running out of memory (and subsequently hanging my machine) when I use enough chains (and enough samples) to get accurate data. I’d love to be able to run a smaller number of chains at a time, and then dump them with pickle to disk, and then at a later point load them all up and merge them to do analysis on.

It is doable, but I don’t recommend it. You would need to touch the internals (fit.sim) and then make sure everything is in order.

I would recommend using ArviZ and doing some array combination with xarray. I actually made an issue today to address this problem (combining multiple InferenceData objects).

Currently the easiest way is probably first to use az.from_pystan(posterior=fit) and then stacking arrays with numpy tools and recreating InferenceData object with convert_to_inference_data where you would input a dictionary.

edit.

https://arviz-devs.github.io/arviz/

ArviZ also supports CmdStan outputs and automatically stacks the results.

I was meaning to ask on the other thread but I thought it would be off-topic. Maybe it’s less so here since it’s basically the same but for cmdStan so here it goes:

Is it possible to extract the raw trace arrays from an InferenceData object (i.e. the .csv output of cmdstan minus the comment lines)?

I know ArviZ can combine chains from separate cmdStan-generated files, but except for the basic diagnostics I would like to be able to process the results as numpy arrays.
I’m guessing this could also be an option for combining separate PyStan outputs as the original question, without moving everything into ArviZ from then on.

Alternatively, my suggestion would be to just extract the bidimensional sample x parameter fit.get_sampler_params() and fit.extract(permuted=False, inc_warmup=True), and stack them into a three-dimensional sample x parameter x chain. That is what I’m doing now, but I guess this is the kind of extra work that is trying to be avoided.

Raw as numpy array? Yes (either manually or .to_dataframe(). But if you need your samples in a table format, pd.read_csv works fine, and then you could stack them with numpy.

For rhat etc calculations you would still need to translate to InferenceData unless you would implement those externally.

ps. Currently ArviZ throws away warm-up samples.

Ok, that works. I’m already converting them to DataFrames and InferenceDatas, so as long as I can merge the latter, then that should work fine.

pandas.read_csv will either include the comment lines or raise an error, depending on the platform, maybe it works in some other platform speciffically, but it doesn’t seem to be applicable in all cases. I had to either have pandas skip lines (which change depending on the number of samples), or read the file line by line and exclude commented ones.

For me the array in the format fit.extract(permuted=False, inc_warmup=True) works fine, so combining either separate PyStan runs or individual cmdStan .csv into an array like that would be close to an ideal solution, but it seems that pandas doesn’t work as expected. I couldn’t find how to manually extract that kind of object from ArviZ, but maybe I need to read the documentation more closely.
Thanks again.

with open(path, "r") as f_obj:
    df = pd.read_csv(f_obj, comment="#")

with line is optional

1 Like