I just wanted to know how I can supply multiple inits, paths to json files, to the sampling method? I have tried supplying a list of paths as inits, however when inspecting the csv files there is a json file being created. I am running code on a cluster, and according to cluster guidlines the /tmp directory should be used sparingly hence I manually store my inits in my home directory and want to use these as inits. Hence the problem is not that I do not know how to provide inits, I just want to provide this such a way that it reads existing json files and does not write to /tmp.
I guess input on how cmdstanpy processes the list of paths as inits would suffice as well such that I can create this json file manually?
Thanks @WardBrian this would solve my problem, I will give it a try. I am however interested in the details behind the file processing? What I see in the csv outputs is that the chains have the same json file as input?
Thanks Brian. I did compile my model with stan threads but only a single json file was referenced across all chains? I would have assumed that that the foo_ID would be applicable in this case? Upon investigating I found that if we compile the model STAN_THREADS=False the foo_ID is applicable, but when STAN_THREADS=True all chains have the same init file?
Anyway I have tried your suggestion with just overwriting the cmdstanpy._TMPDIR, but this did not yield anything. Files were still written to the /tmp directory. I delved into the source a bit and found that _TMPDIR is imported (upon importing cmdstanpy) into several different methods. These include cmdstan_args, utils.filesystem, stanfit.runset, stanfit.mcmc, and since the _TMPDIR is used in stanfit.runset.Runset method all other methods of stanfit also has itâs own local version of the _TMPDIR variable (as generated initially), due to all the packages being imported at once. This makes it nearly impossible to change _TMPDIR after it has been initialized.
I do have a solution for this however, but since cmdstanpy was installed using root privileges I can not change this on the cluster. This would involve changing the source for __ init__.py slightly. We define a new environmental variable, STAN_TMPDIR which is an existing path where you would like stan to write by default. I.e. we shall create a new random sub-directory in STAN_TMPDIR by changing the code as follows (and keeping the currently behaviour if no or incorrect/non-existent STAN_TMPDIR is supplied):
...
import tempfile
import os # used to check if STAN_TMPDIR is set and is an existing path
# Check if 'STAN_TMPDIR' exists as an environment variable
if 'STAN_TMPDIR' in os.environ:
# Check if it's an absolute path that exists
if os.path.isabs(os.environ['STAN_TMPDIR']) and os.path.exists(os.environ['STAN_TMPDIR']):
dir = os.environ['STAN_TMPDIR'] # Use specified directory as /tmp
else:
dir = None # Default to /tmp
else:
dir = None # Default to /tmp
_TMPDIR = tempfile.mkdtemp(dir=dir)
...
This should be an efficient way to change the source, giving users the ability to change the tmp directory whilst keeping all other source the same. We could probably include some warning when STAN_TMPDIR is supplied but is non-existent, incorrect or not absolute path?
Are you observing that they actually initialize at the same point, or just that the header comment says the same file for each? The header comment will be identical between different files when STAN_THREADS=true, even though the chains can still have different initializations, ids, etc.
This seems like a reasonable proposal, would you mind opening an issue or PR in the cmdstanpy repository?
Yes, the header comment is a faithful reconstruction of the command line given to the Stan executable, but in the multi-chain multi-threaded case, the command line uses shortcuts like foo.json being shorthand for foo_1.json, foo_2.json, etc. This is definitely confusing, and has a few open issues about it: id for each chain should be unique in multi chain ¡ Issue #1257 ¡ stan-dev/cmdstan ¡ GitHub
This was discussed in a github issue where we determined that using os.environ[âTEMPDIRâ] is the best way to control Pythonâs behavior that cmdstanpy relies on