Dear People of Stan,
I’m working with a model which is not that complicated but is giving me some problems. I guess, the main issue here is dimensionality and how to manage that on my machine. The number of parameters (dimensions) of the model can be evaluated as follows:
I have three hyperparameters, which are the ones I am really interested in \mu,\sigma,\lambda.
Then, my data have S\approx 5*10^3 features, modeled by S variables \boldsymbol{n}. The model is a graphical one with two layers of latent variables \boldsymbol{z},\boldsymbol{v}: so the model has about 3+10^4 parameters. As mentioned before, I may be really interested just in getting the posterior of \mu,\sigma,\lambda or, at most, to the posterior of the first hidden layer \boldsymbol{z}. Everything works well in “low” dimension (S=10,100).
Now, the problem is the following. When, with pystan, I try to launch my stan model (with S\approx 5*10^3) it reaches 0% sampling and gives me the following error
Sampling: 0%
Traceback (most recent call last):
File “/mnt/MetaGym/analysis/workflow/bayestan/cricket.py”, line 114, in
fit = sm.sample(num_samples=fit_cnfg[“samples”],init=P0,num_chains=fit_cnfg[“num_chains”],refresh=0)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 89, in sample
return self.hmc_nuts_diag_e_adapt(num_chains=num_chains, **kwargs)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 108, in hmc_nuts_diag_e_adapt
return self._create_fit(function=function, num_chains=num_chains, **kwargs)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 312, in _create_fit
return asyncio.run(go())
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/asyncio/runners.py”, line 44, in run
return loop.run_until_complete(main)
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/asyncio/base_events.py”, line 647, in run_until_complete
return future.result()
File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/stan/model.py”, line 236, in go
raise RuntimeError(message)
RuntimeError: Exception during call to services function: ‘OSError(28, ‘No space left on device’)’, traceback: ‘[’ File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/httpstan/services_stub.py”, line 158, in call\n httpstan.cache.dump_fit(b"".join(compressed_parts), fit_name)\n’, ’ File “/mnt/MetaGym/anaconda3/envs/bayesian/lib/python3.9/site-packages/httpstan/cache.py”, line 111, in dump_fit\n fh.write(fit_bytes)\n’]’
The solution I tried is the following. Asking a pal, he suggested using the argument pars in the sampling statement, to retain the subset of interesting parameters. However, my stan version (3.5.0) does support this feature anymore. My questions are the following:
1] Will installing an older version of stan (with pars as a feature) solve my problem?
2] Does my stan version has a mechanism that can replace the pars feature, so I get rid of the “non-interesting” ones?
3] Shuld I move the another stan API, such as cmdstanpy?
For completeness about my machine, I have an ubuntu 16.04 machine with a 2TB external disk mounted at /dev/vdb.
df -H gives me (partial output):
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 26G 25G 983M 97% /
/dev/vdb 2.2T 1.7T 314G 85% /mnt
If I am not wrong, /dev/vda1 is where the home is located. Then I run, in the home ~ directory, du -hx --max-depth=1 finding that (partial output):
19G ./.cache
[3] So it may be that the problem is in the chace? If yes (the stan error suggests something of that kind) how should I deal with that?
I’ll definitely appreciate some help in this puzzling situation! Cheers
Jacopo