Cache problem with a big model

jacopo_easter · November 10, 2022, 11:05am

Dear People of Stan,

I’m working with a model which is not that complicated but is giving me some problems. I guess, the main issue here is dimensionality and how to manage that on my machine. The number of parameters (dimensions) of the model can be evaluated as follows:

I have three hyperparameters, which are the ones I am really interested in \mu,\sigma,\lambda.
Then, my data have S\approx 5*10^3 features, modeled by S variables \boldsymbol{n}. The model is a graphical one with two layers of latent variables \boldsymbol{z},\boldsymbol{v}: so the model has about 3+10^4 parameters. As mentioned before, I may be really interested just in getting the posterior of \mu,\sigma,\lambda or, at most, to the posterior of the first hidden layer \boldsymbol{z}. Everything works well in “low” dimension (S=10,100).

Now, the problem is the following. When, with pystan, I try to launch my stan model (with S\approx 5*10^3) it reaches 0% sampling and gives me the following error

Sampling: 0%
Traceback (most recent call last):
File “/mnt/MetaGym/analysis/workflow/bayestan/cricket.py”, line 114, in
fit = sm.sample(num_samples=fit_cnfg[“samples”],init=P0,num_chains=fit_cnfg[“num_chains”],refresh=0)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 89, in sample
return self.hmc_nuts_diag_e_adapt(num_chains=num_chains, **kwargs)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 108, in hmc_nuts_diag_e_adapt
return self._create_fit(function=function, num_chains=num_chains, **kwargs)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 312, in _create_fit
return asyncio.run(go())
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/asyncio/runners.py”, line 44, in run
return loop.run_until_complete(main)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/asyncio/base_events.py”, line 647, in run_until_complete
return future.result()
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 236, in go
raise RuntimeError(message)
RuntimeError: Exception during call to services function: ‘OSError(28, ‘No space left on device’)’, traceback: ‘[’ File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/httpstan/services_stub.py”, line 158, in call\n httpstan.cache.dump_fit(b"".join(compressed_parts), fit_name)\n’, ’ File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/httpstan/cache.py”, line 111, in dump_fit\n fh.write(fit_bytes)\n’]’

The solution I tried is the following. Asking a pal, he suggested using the argument pars in the sampling statement, to retain the subset of interesting parameters. However, my stan version (3.5.0) does support this feature anymore. My questions are the following:

1] Will installing an older version of stan (with pars as a feature) solve my problem?
2] Does my stan version has a mechanism that can replace the pars feature, so I get rid of the “non-interesting” ones?
3] Shuld I move the another stan API, such as cmdstanpy?

For completeness about my machine, I have an ubuntu 16.04 machine with a 2TB external disk mounted at /dev/vdb.

df -H gives me (partial output):

Filesystem Size Used Avail Use% Mounted on
/dev/vda1 26G 25G 983M 97% /
/dev/vdb 2.2T 1.7T 314G 85% /mnt

If I am not wrong, /dev/vda1 is where the home is located. Then I run, in the home ~ directory, du -hx --max-depth=1 finding that (partial output):

19G ./.cache

[3] So it may be that the problem is in the chace? If yes (the stan error suggests something of that kind) how should I deal with that?

I’ll definitely appreciate some help in this puzzling situation! Cheers

Jacopo

mitzimorris · November 10, 2022, 12:12pm

yes. installation instructions here: Installation — CmdStanPy 1.2.0 documentation

ahartikainen · November 10, 2022, 12:28pm

no
no
yes (maybe)

I think you should be able to sample with CmdStanPy, but I think it will fail to access the results. (make sure to save your result csvs)

But I think there are ways to handle this and maybe we can finally fix arviz.from_cmdstan to handle these larger than RAM situations.

cc @OriolAbril for arviz
cc @WardBrian memory use with cmdstanpy

WardBrian · November 10, 2022, 2:44pm

Yeah, CmdStanPy’s current design isn’t really organized around larger-than-memory samples. It will let you sample them, but to load them you’ll need to use another tool or somehow edit the files first to make sure they only contain the parameters you care about.

This is partially something which would be fixed if we had better IO formats and could use libraries which support out-of-core data

Topic		Replies	Views
Ram problem when pytan is running with very large files General	3	43	October 30, 2024
Learning Pystan and vectorizing models Modeling	2	886	July 3, 2017
Fit.extract() takes a long time PyStan	9	3087	May 22, 2018
How to force PyStan to recompile a stan model? PyStan	12	1145	March 1, 2022
Help with first model in Pystan Modeling	4	734	January 10, 2020

Cache problem with a big model

Related topics