Sampler HMC diagnostics file

This is to answer the evergreen question: '“what diagnostics are available from the HMC sampler and how do I get me some?”

CmdStan can spit out different csv files -

  • an output file in Stan csv format (e.g. output_file=my_sample.csv), which contains sampler draws on the constrained scale

  • a diagnostic_file (e.g. diagnostic_file=my_diag.csv)

The diagnostic file contains the same set of initial and final comments as the output.csv file - the initial comments contain the CmdStan config, the final comments contain the timing information.

(Note - “diagnostic_file” is confusing, and we’re planning to call this “latent_dynamics_file” in the CmdStanPy and CmdStanR interfaces).

The actual csv data consists of:

the sampler state variables, followed by the parameter values on the unconstrained scale followed by the parameter potential energy values, followed by the parameter gradients.

e.g.: given model with 2 parameters, mu, sigma, the data columns are:

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta,sigma,p_theta,p_sigma,g_theta,g_sigma

There is 1 row per saved iteration, i.e., config save_warmup and thin control how often the sampler writes to both the output and diagnostic files.

This code:

calls this code:

Hope this answers questions that @Bob_Carpenter and @s.maskell might have about gradient information for control variates, etc.

5 Likes

@mitzimorris: Thanks. This will also be relevant to others working with me, notably @PhilClemson and @LJDevlin.

Thanks that’s really useful - as expected I was looking in completely the wrong place!