Operating System: linux cluster
Interface Version: cmdstan
Compiler/Toolkit: qsub, PBS
Just want to keep a record about the captioned in case others would need to do this.
I attempted to run multiple chains following p.28 of the cmdstan-guide-2.19.1.pdf but the chains would run sequentially.
Following @bbbales2’s suggestion, I now run the multiple chains in parallel as batch jobs on the cluster:
In the .sh shell script to be submitted with the qsub command, put down
#PBS -t 1-4
cd ~/cmdstan-2.19.1
time ../SSM0.5.2dev sample algorithm=hmc metric=dense_e adapt delta=0.8 \
id=$PBS_ARRAYID data file=../SSM_data.dat output file=samples$PBS_ARRAYID.csv
The -t 1-4 option of the PBS command schedules an array of jobs (1 to 4). The different job numbers will be captured by the environment variable $PBS_ARRAYID as explained in this page.
The second line changes the directory to cmdstan directory that must be the location to use the sampling command in the third line
the inclusion of time in the third line keeps track of the wall time. I need to run the compiled version of the .stan file (in my case SSM0.5.2dev) located in my home directory, which is a level up from the cmdstan directory.
(In my case only, I need to use the metric=dense_e option, instead of the default choice metric=diag_e.)
The output file=samples$PBS_ARRAYID.csv specifies that the sampling output files would be samples1.csv, ... samples4.csv located in the cmdstan directory (a path could have been added to place the output files elsewhere).
Note: Different clusters use different approaches to specify an array of batch jobs. See this post for the case of another cluster. See also this reply to the post.
It’s been a while since I ran cmdstan on our cluster w/ multithreading, but when I was it looked like the timed outputs in the CSVs weren’t matching up with the actual walltime for each chain (CSV time was much longer than actual runtime), so I was using time to time them. Maybe OP is having a similar issue?
Indeed not necessary. The time part came from McElreath’s use of it in this post on a related topic. I didn’t realize the csv output also reports the info.
Do you happen to know whether the command
bin/stansummary samples*.csv
has an option to report only the summary for selected parameters like rstan's
print(fit, c("alpha", "beta", ... ))
?
I am struggling to find a convenient way to examine the output from cmdstan.
I have a bunch of “derived” parameters due to non-centered reparametrization of correlated parameters, such as p_mu, p_sd, p_L, p_err for the original parameter p. My primary interest is in the original parameters. The variation lets me filter out those derived parameters.