I am using pickle to load saved model+fit and then trying to extract the data using the following code:
with open('model.pkl', 'rb') as f:
data = pickle.load(f)
f.close()
fit = data[1]
params = fit.extract(pars='theta', permuted=True)
The line with fit.extract() takes just too long - over two hours. The *.pkl file is ~3GB, so I guess it is fine it doesn’t take 5 minute, but over two hours is just wrong. I tried using permuted=False, but it didn’t help.
I am using Python3.6, pystan 2.19 and PyCharm 2020.1
Please help.
it’s 4 chains with 1500 iterations each (thinning = 1, warmup = 500), so the shape of the permuted version is 4000x30x2000
I know it is not small, but when I extract the first model in every python instance, it takes about 10 mins. Starting the second model it takes two hours and more even if I close and delete the previously loaded data.
Just to clarify, the two hours and more is not the final estimate. It’s just when I give up and open another python instance to load every model manually.
yes, first model loads fairly well, but then it gets stuck. I have 16GB of memory, so it should be fine for this data.
I am trying now to use ArviZ, but from the time it takes to extract data from fit of the first model, looks like it performs worse than fit.extract()
Yes, you are right. It works almost the same for the first model. Second model was extracted in 15 mins! Hope this will persist in a loop. Thanks!
Btw, is there an equivalent for permuted=True when I extract from idata?
I need to extract at the end only the mean value. I cannot use summary stats because I have nans in my posterior samples.
I can call it all, but could you be more specific about gc.collect()? I never used it before.
You there is some ram problems, then calling gc.collect will “collect” the garbage (puthon does this automatically, but sometimes it is “easier” to collect manually.
Do you have nans in your draws? Or you have a some cells with nan value only?
Thanks! I incorporated now garbage collections in my python code.
I have some cells with nans because of how my data is structured (I have different number of trials for different subjects, so for posterior predictive checks some trails are nan)
np.nanmean was a great solution so far.
Thank you so much!