Sampler HMC diagnostics file

mitzimorris · May 26, 2020, 1:01am

This is to answer the evergreen question: '“what diagnostics are available from the HMC sampler and how do I get me some?”

CmdStan can spit out different csv files -

an output file in Stan csv format (e.g. output_file=my_sample.csv), which contains sampler draws on the constrained scale
a diagnostic_file (e.g. diagnostic_file=my_diag.csv)

The diagnostic file contains the same set of initial and final comments as the output.csv file - the initial comments contain the CmdStan config, the final comments contain the timing information.

(Note - “diagnostic_file” is confusing, and we’re planning to call this “latent_dynamics_file” in the CmdStanPy and CmdStanR interfaces).

The actual csv data consists of:

the sampler state variables, followed by the parameter values on the unconstrained scale followed by the parameter potential energy values, followed by the parameter gradients.

e.g.: given model with 2 parameters, mu, sigma, the data columns are:

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta,sigma,p_theta,p_sigma,g_theta,g_sigma

There is 1 row per saved iteration, i.e., config save_warmup and thin control how often the sampler writes to both the output and diagnostic files.

This code:

github.com

stan-dev/stan/blob/9c09195caad83a054d4dc053f347889900fff145/src/stan/mcmc/hmc/base_hmc.hpp#L57-L64


void get_sampler_diagnostic_names(std::vector<std::string>& model_names,
                                  std::vector<std::string>& names) {
  z_.get_param_names(model_names, names);
}

void get_sampler_diagnostics(std::vector<double>& values) {
  z_.get_params(values);
}

calls this code:

github.com

stan-dev/stan/blob/9c09195caad83a054d4dc053f347889900fff145/src/stan/mcmc/hmc/hamiltonians/ps_point.hpp#L27-L45


virtual inline void get_param_names(std::vector<std::string>& model_names,
                                    std::vector<std::string>& names) {
  names.reserve(q.size() + p.size() + g.size());
  for (int i = 0; i < q.size(); ++i)
    names.emplace_back(model_names[i]);
  for (int i = 0; i < p.size(); ++i)
    names.emplace_back(std::string("p_") + model_names[i]);
  for (int i = 0; i < g.size(); ++i)
    names.emplace_back(std::string("g_") + model_names[i]);
}

virtual inline void get_params(std::vector<double>& values) {
  values.reserve(q.size() + p.size() + g.size());
  for (int i = 0; i < q.size(); ++i)
    values.push_back(q[i]);
  for (int i = 0; i < p.size(); ++i)
    values.push_back(p[i]);
  for (int i = 0; i < g.size(); ++i)
    values.push_back(g[i]);

Hope this answers questions that @Bob_Carpenter and @s.maskell might have about gradient information for control variates, etc.

s.maskell · May 26, 2020, 3:00pm

@mitzimorris: Thanks. This will also be relevant to others working with me, notably @PhilClemson and @LJDevlin.

PhilClemson · May 26, 2020, 3:14pm

Thanks that’s really useful - as expected I was looking in completely the wrong place!

Topic		Replies	Views
Faster / better loading of sampler diagnostics in cmdstanr? Interfaces cmdstanr	4	526	November 3, 2020
Error: Can't find the following sampler diagnostic(s) in the output: treedepth__, divergent__ CmdStan	4	744	August 7, 2024
Efficient way to save diagnostics CmdStan cmdstanr	2	61	July 16, 2024
Reading cmdstanr csv files CmdStan	2	370	October 16, 2023
Make a Stanfit object from cmdstan output files CmdStan cmdstan	4	1427	November 10, 2022

Sampler HMC diagnostics file

Related topics