This is to answer the evergreen question: '“what diagnostics are available from the HMC sampler and how do I get me some?”
CmdStan can spit out different csv files -
-
an output file in Stan csv format (e.g.
output_file=my_sample.csv
), which contains sampler draws on the constrained scale -
a diagnostic_file (e.g.
diagnostic_file=my_diag.csv
)
The diagnostic file contains the same set of initial and final comments as the output.csv file - the initial comments contain the CmdStan config, the final comments contain the timing information.
(Note - “diagnostic_file” is confusing, and we’re planning to call this “latent_dynamics_file” in the CmdStanPy and CmdStanR interfaces).
The actual csv data consists of:
the sampler state variables, followed by the parameter values on the unconstrained scale followed by the parameter potential energy values, followed by the parameter gradients.
e.g.: given model with 2 parameters, mu
, sigma
, the data columns are:
lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta,sigma,p_theta,p_sigma,g_theta,g_sigma
There is 1 row per saved iteration, i.e., config save_warmup
and thin
control how often the sampler writes to both the output and diagnostic files.
This code:
calls this code:
Hope this answers questions that @Bob_Carpenter and @s.maskell might have about gradient information for control variates, etc.