Proposal for consolidated output

When discussing possible additional diagnostic outputs from Stan at Getting the location + gradients of divergences (not of iteration starting points), we realized that a better plumbing of outputs from the algorithms is probably a necessary prerequisite. I checked the currently proposed solution for consolidated output and I believe it would not handle this use case very well. So I sketched an alternative proposal - the details are not completely fleshed out, but I hope it is enough to get the rough idea and see if it is worth pursuing further. I have added it to the wiki:,-Optimize,-ADVI,-and-Diagnose#alternative-proposed-solution
Edit: the wiki is now at,-writers,-and-interrupts-in-one-object

1 Like
  • Re: log4j: in the past we’ve discussed this and catalogued some of the libraries involved but they didn’t seem like they were worth the added dependency burden. In C++ the burden of adding dependencies ends up being a little higher than in something like R/Python and their functionality (for a logging library) doesn’t mean much to us since most of the work is here (defining outputs, defining integration into C++ code base, defining how interfaces hook in) and most of that isn’t actual coding.

  • Overall I support the goal of making it easier to add different types of output without modifying (much) of the IO design and without adding ad-hoc streams (like we would currently need to). One part of your design that makes me concerned about the maintenance is that it has to deal with generic tables explicitly. I think this bit may not be necessary:

template<class OutputStreams>
 class table_consumer {
    //Header defining functions, probably more combinations of type/arity needed
    virtual void add_double_column(const string& name) = 0;
    virtual void add_double_columns(const vector<string>& names) = 0;

    virtual void add_int_column(const string& name) = 0;
    virtual void add_string_column(const string& name) = 0;
    virtual void header_end() = 0;

    virtual void void row_start(const OutputStreams::indexing& index) = 0;
    virtual void consume_double(double value) = 0;
    virtual void consume_doubles(const std::vector<double>& values) = 0;
    virtual void consume_doubles(const Eigen::VectorXd& values) = 0;
    virtual void consume_int(int value) = 0;
    virtual void row_end() = 0;
    virtual void close() = 0;
  • I think this is meant to handle things like the combined stepsize__ and divergent__ parameters that have different types. For those we could use tuples (s.t. everything is static and simple) or a templated class like the above but with compile-time resolution of collumn types (fancier use of parameter packs but it should work out and avoid run-time output bugs). For larger outputs (e.g.-mass matrix) my thought was to wrap them in small classes and use overloads or templates to resolve exactly which output stream gets the item. For smaller things (e.g.- the 9 or so algorithm-parameters that NUTS might spit out) it probably makes more sense to pack a tuple and write that.

Re: streams within transitions: it might make sense to make these switches static s.t. there’s no penalty incurred unless a debug flag is set on compilation. For larger models I don’t think an ‘if’ would matter so that’s worth testing… The reason I’m suggesting making more of this design static is that if you’re adding outputs you can edit a few lines of code yourself (if we make it easy!), so the add_*_column code below is not necessary.

   relay.get_key_value_consumer(hmc_output_streams::algorithm_params).consume_value("adapt_delta", delta);
   table_consumer<hmc_output_streams::index> iterations_consumer = relay.get_table_consumer(hmc_output_streams::iterations);
   iterations_consumer.add_double_columns(<<model param names>>);
  • Re: simpler signatures: this could be (on relay/consumer construction): consumer<std::vector<double>, double, double, int>(std::ostream& samples, std::ostream& algorithm_parameters); (the std::ostream is just an example of where output might go). Then at sampler setup: consumer.set_names(std::string names), and this could be checked for the expected number of names.

  • Then the sampler has to call consumer(parameter_values,, sampler.divergent) rather than individually adding things. With C++11 and parameter packs this ends up not even resulting in hard-coding anything. With the suggestions I’m making if you want to add an output type, you would: 1) change the tuple that describes what the consumer handles; 2) include it in the calls to the consumer; and if necessary 3) modify the interface hook for how to handle the additional output.

I was hoping I’d get a chance to write this down in more detail over the weekend but I didn’t so you get the hand-wavy version :/

1 Like

Good point with the templates/parameter pack magic. I didn’t want to go into it because I am a bit out of my depth there, but maybe it’s time to increase my C++ fu :-) How hard can it be anyways? :-D.

Also I think that maybe the per-iteration data are just so frequently used that they may be worth some special care, e.g. treating them as a separate type of stream. The motivation is that I think it should be easy to turn individual streams of the per-iteration data (gen quants, unconstrained params, RNG state, …) on and off per run AND still pipe everything directly into a single CSV file. This would require fixed ordering of consume calls, but that seems manageable. Whereas for other types of output it is IMHO OK that every stream ends up in its own file/socket/… and repeats header columns.

I am not very afraid of the dynamic if clauses - they are runtime constant and will likely get ~100% correct branch prediction. Having to recompile model to get different output seems annoying (and you can’t compile all variants as there is a combinatorial explosion of possibilities). Also everyone keeps saying the the main bottleneck is the autodiff stack - I didn’t check that myself, but if this is the case a single if should not be noticeable.

Good that someone did that. I guess rolling out our own won’t be that crazy. My current plan would be to ignore the logging for a first implementation pass, keeping the current logger, and - if everything works out - to implement logging on top of the table output stream.

1 Like

I’ve done a fair number of demo projects with it at this point so I feel pretty confident about it. If I don’t write out signatures I sometimes miss things that can’t work and assume they can :/ …

I’d like to design this so that there’s a relay object that gets used within Stan, but that it’s parameterized (on construction with objects or statically using templates) so that the interfaces are responsible for only the code that gets plugged into the relay. We could write some standard relay plug-ins that write to .csv files or std::stream or whatever s.t. any interface could use them. I think those plug-ins would be the place to have, e.g., some generic implementations for per-iteration data.

Fair enough, I think sometimes they can cause code-bloat that messes with inlining and caches (I think Stan has cache-size problems, from another trhead) but I admit that I don’t know that part of the code base well enough and in any case we could do some timings to verify your very reasonable suggestion.

When I looked at the logging design my main thought was that we could make it usable with what’s currently under discussion by adding some signatures, these things don’t have to be rolled into one framework and keeping them modular has benefits too. We just need to push this along and see what it looks like further along the line.

I updated the wiki, hopefully in a way that integrates the work so far and separates this out from the other output problema (file format stuff)

1 Like

So I had a go at actually implementing the necessary template magic. And I had fun. Also it was a throwback to college courses in Prolog :-D I currently think that what we aim for is mostly possible, with some minor modifications.

The key-value output seems easy, so I focused on the hetergenous tabular output (homogenous tables could be made directly with the same code or with very slightly specialized version). I however have no idea how to be able to have relay.table<handler_type>(T1 v1, T2 v2, ...);, that is how to make handler_type decide the number of parameters. However, you can make do with relay.table<handler_type>({T1 v1, T2 v2, ...}); (added braces), so that seems OK.

The code I have now is below. You can also try it online and see it with synatx highlighting.

#include <iostream>
#include <vector>
#include <tuple>
#include <type_traits>
#include <cstddef>
#include <exception>
#include <string>
#include <algorithm>

 * Defining the output types 

//Helper for output types with fixed headers
struct no_header {
	static no_header nh;
no_header no_header::nh;

struct hmc_iter_stats {
	double energy;
	int treedepth;
	bool divergent;

	typedef no_header header_def;

struct constrained_params {
	std::vector<double> values;

	typedef std::vector<std::string> header_def;

 * data_stream - ancestor of all streams of data

//I require all template parameters to be class so that
//a different name -> different type
template <class T> class data_stream {
	typedef T output;
	// Couldn't figure out the syntax with typedef
	using header_def = typename T::header_def;

	virtual void send_headers(const header_def &header) = 0;
	virtual void send(const T &data) = 0;

 * Some utility templates to be able define the relay 

template<class T> struct always_false : std::false_type {};

//Resolves to the first index of What in the paremter pack Where
template <typename What, typename... Where>
struct index_of;

//Case 0: type not found, providing at least a little meaningful error message
template <typename What> 
struct index_of<What> {
  static_assert(always_false<What>::value, "The required type is not part of the type list");

//Case 1: Found the type
template <typename What, typename... Where>
struct index_of<What, What, Where...> : std::integral_constant<std::size_t, 0> {};

//Case 2: Recursion
template <typename What, typename Where_head, typename... Where_tail>
struct index_of<What, Where_head, Where_tail...> : std::integral_constant<std::size_t, 1 + index_of<What, Where_tail...>::value> {};

 * The relay and its instances

template <class... Outputs>
class relay {
  std::tuple<data_stream<Outputs>*...> streams;

  template<class T> data_stream<T>*& stream_ref() {
  template<class T> data_stream<T>* get_stream() {

  template<class T> void set_stream(data_stream<T>* stream) {
    if(stream_ref<T>() != 0) {
      throw std::invalid_argument("Stream already set."); 
    stream_ref<T>() = stream;

  template<class T> void send_headers(const typename T::header_def& header) {
    if(stream_ref<T>() != 0) {

  template<class T> void send(const T& data) {
    if(stream_ref<T>() != 0) {

typedef relay<hmc_iter_stats, constrained_params> hmc_relay;

 * Some data_stream implementations

//Helper for multiple types of possible outputs
template<class T>
  class to_string_columns;

  class to_string_columns<hmc_iter_stats> {
      static void get_headers(const hmc_iter_stats::header_def&, std::vector<std::string>& out) {

      static void get_values(const hmc_iter_stats& data, std::vector<std::string>& out) {

  class to_string_columns<constrained_params> {
      static void get_headers(const constrained_params::header_def& header, std::vector<std::string>& out) {
        out.insert(out.end(), header.begin(), header.end());

      static void get_values(const constrained_params& data, std::vector<std::string>& out) {
        std::for_each(data.values.begin(), data.values.end(), [&out](double val) {

template<class T> 
  class csv_stream : public data_stream<T> {
      std::ostream& out;
      std::vector<std::string> buffer;
      csv_stream(std::ostream& out) : out(out), buffer() {}

      virtual void send_headers(const typename T::header_def &header) override {
        to_string_columns<T>::get_headers(header, buffer);
        std::for_each(buffer.begin(), buffer.end(), [this](std::string& s) {
          out << s << ',';
        out << std::endl;

      virtual void send(const T &data) override {
        to_string_columns<T>::get_values(data, buffer);
        std::for_each(buffer.begin(), buffer.end(), [this](std::string& s) {
          out << s << ',';
        out << std::endl;

int main() {
	hmc_relay r;
  csv_stream<constrained_params> params_stream(std::cout);

  csv_stream<hmc_iter_stats> stats_stream(std::cout);

  std::vector<std::string> param_names;

  std::vector<double> param_values;
  r.send<hmc_iter_stats>({0.5, 1, true});

  /* Uncomment to see error when stream is not defined
  struct undefined_output {
 	  typedef no_header header_def;   


That’s awesome, thanks! I’ll read through this today and see if I can ditch the curly brackets.

While I’ll be happy to learn how to this, I am no longer 100% sure getting rid of the curly braces is desirable. Having an explicit data type that represents the table row seems useful, especially if you cannot collect all the values at once. Also it might make sense for the interfaces, as I guess it might be possible to just directly access this C struct from the interface without any serialization.

I’m very much behind consolidating argument groups into structures (aka “parameter objects”). I created a separate topic that provides some context to the top-level decision being discussed here to bundle parameters.

Putting on the forum moderator hat for a sec.:

  1. The writers/loggers/interrupts are different because the interfaces have to specify a whole set of behaviors when building a “relay” for the loggers whereas for most other items the interface just supplies a few int/double values.
  2. I agree the high-level discussion of the builder pattern is on-topic here, we should share a builder pattern.
  3. Detailed discussion about which args belong in which bundle should be in its own thread or (ideally) a design doc on the Wiki
  4. var_context also brings up a host of unrelated issues (including performance IIRC) and belongs in the var_context thread: Var_context, what do we want from it?
  • Removed the slightly derogatory ‘bike-shedding’, I didn’t mean it like that.

I put this on the already existing Wiki and organized it a little:, that seems like a good place to put implementation suggestions.

I’m sure we could make that happen, though maybe in terms of order it goes after removing the transforms.

I updated the Wiki:,-writers,-and-interrupts-in-one-object

I also re-organized @martinmodrak’s code s.t. the separation of responsibilities I expect to occur in our repos is clearer:

1 Like

OK, I’ll create a separate issue and page for this. I just brought it up here to try to put the design in the context of the goal to bundle parameter arguments.

I don’t understand yet what the relays are going to do functionally other than act as simple structs to hold the relevant writers.

1 Like

I’ve been thinking a bit about what Dan and Bob said on the meeting today and I see a less elegant, but minimalist implementation:

  1. Create a relay class that contains all the writers the algorithm supports. This would be a different class per algorithm group, possible some structure via inheritance. This avoids changes to code when a new writer is added (e.g for more info on divergent transitions).
  2. The relay class also stores information about which elements for individual writers are enabled, to allow more fine-grained control (e.g. whether diagnostic outputs contains momenta, RNG states, etc.)
  3. Create a new class key_value_writer that would supplant passing strings to the current writers (AFAIK these could all be plausible considered key-value pairs like alg. name, version, initial seed, etc.). Add this to the relay.


  • We’ve lost some safety in the code, as all complex output is constructed by appending individual values.
  • We have to convert everything to double
  • Code for the algorithms is contaminated with non-trivial amounts of output specific code: the column names for lp__, divergent__, sampler params etc., checking which data should be sent (RNG state, …), etc.
  • It is hard to disentangle the output back into its individual elements on the C++ side, e.g. for row filtering or to provide more structured output for the interfaces (e.g., separating model params from sampler info)
  • To determine what kind of data and in what order the algorithm actually outputs, you have to deep dive into algorithm code

I also believe that there is a spectrum between this minimalist implementation and the full blown templated beast, i.e. resolving only some of the downsides might be possible with a smaller increase in complexity.

Just FYI re: the meeting. I’ll write what Bob requested this weekend, I think we can outline what the implementation would look like at higher level for the minimal and the more complex solutions.

Minimalism and elegance, while both on the nice-to-have list, are not strictly necessary.
What we really need to prioritize is maintenance, by which I mean testing, documentation, and extensions in the future. At least for this kind of pass-through code that shouldn’t have any performance impact.

What I’d like to do is expand our focus to a full 360 around the services API. The interfaces are the clients of the service API, so it’s their interest we most need to keep in mind from a design perspective. Drilling down a level, the algorithms are clients of the services API:

interfaces —clientof—> services —clientof—> algorithms

The callback structure inverts the interface-services relationship, so that the relation to the callbacks is:

interfaces —implement—> callbacks <—call— algorithms
What I’d like to see first is an inventory of

  1. what’s currently being written by th algorithms to the callbacks
    • by type: string, integer, double, vector, etc.,
    • by service: NUTS adapt diagonal, ADVI dense, etc.,
    • by frequency: once at data, once at adaptation, warmup, sampling, ad hoc, multiple per iteration, per iteration, etc., and
  2. how the interfaces currently implement the callbacks
    • duplicated code
    • pain points for integration, testig or maintenace
    • hacks like using strings to encode adaptation

Then I’d like to know

  1. what new things will this enable to be easily written (that is, what’s next)
  2. how does this impact config on the part of the interfaces
  3. how interfaces can better share implementations with a new design.

Some wish list items of mine would be something that makes it easier to turn off output other than the draws (RStan and PyStan both struggle with this, I think). This might be as easy as defining null writers that get plugged in as implementations by the callbacks.

This then gets wrapped up into selling points for the new interfaces focusing on maintainability and extensiblity, and then we build it. By which I mean I hope you all build it!

Serialization is a very closely related concern. Exactly what is being written here is going to wind up being the target of serialization and deserialization. It might be useful as such to consider serialization an addition client in addition to the interfaces (though such a serialization would be very closely related to what CmdStan does).

The current thinking is to define a protocol buffer format for these streams that could implement the callbacks from an interface perspective. Then the analysis API and interfaces could read it back in.

I’m still not clear on where the interrupts fit into all of this. Are they part of the design?

1 Like

Here are all cases where a writer is called directly, there are additional files where a writer is just passed down.

file: src/stan/model/test_gradients.hpp
description: This is writing only a table, and some string messages, each row of the talbe is a parameter

  • logger and parameter_writer get message string from log_prob_grad call
  • logger and parameter_writer get message string from finite_diff_grad call
  • logger and parameter_writer get header (“param_idx”, “value”, “model”, “finite diff”, “error”) as a string
  • logger and parameter_writer get gradient test info string (k, params_r[k], grad[k], grad_fd[k], and (grad_[k] - grad_fd[k])

file: src/stan/services/experimental/advi/meanfield.hpp,
description: this services method only sets up the header for parameter_writer and calls another method (passing writer)

  • parameter_writer gets lp__, followed by model.constrained_param_names(names, true, true);

file: src/stan/services/experimental/advi/fullrank.hpp,
description: as above, calls initialize, sets up header, calls something from variational::advi
usage: as above.

file: src/stan/services/sample/fixed_param.hpp
description: No iterations here, things written once.

  • util::mcmc_writer gets sample_writer, diagnostic_writer, logger) this is relevant b/c any new relay is like the mcmc_writer
  • mcmc_writer writes header of sampler, writes header of diagnostic file
  • writer passed to util::generate_transitions
  • timing written to writer

file: src/stan/services/sample/standalone_gqs.hpp
description: not much done here since the main action is within gq_writer

  • util:gq_writer constructed from sample_writer, logger, and num_params, this is another ad-hoc thing like mcmc_writer that groups other writer_callback writers
  • writer.write_gq_names(model) called once
  • writer.write_gq_values(mode, rng, draw) called per iteration

file: src/stan/services/optimize/bfgs.hpp
description: there’s a switch that decides if the per-iteration writing occurs.

  • parameter_writer called with std::vector<std::string> parameter names, once
  • parameter_writer called with std::vector parameter values, once
  • per iteration parameter_writer called with std::vector parameter values
  • parameter_writer called with std::vector parameter values, once

file: src/stan/services/optimize/lbfgs.hpp
description: lots of boilerplate, like bfgs

  • called like above for bfgs

file: src/stan/services/optimize/newton.hpp
description: like above

  • called like bfgs/lbfgs

file: src/stan/services/util/gq_writer.hpp
description: ad-hoc class created to add the standalone genearted quantities calls, bundles the writers, lots of overlap with mcmc_writer class.

  • sample_writer called to write gq_names (std::vector<std::string>)
  • sample_writer called to write gq_values (std::vector<double>)
  • uses model.write_array

file: src/stan/services/util/run_sampler.hpp
description sets up headers then uses mcmc_writer to wrap writers

  • writes std::vector<std::string> for parameter names once
  • writes std::vector<std::string> for diagnostic names once
  • passes mcmc_writer (and therefore all the writers) into util::generate_transitions for per-iteration writing of `std::vector of parameter/diagnostic values
  • calls mcmc_writer::write_adapt_finish to write post-adaptation to writers (mass matrix, once
  • passes mcmc_writer (and therefore all the writers) into util::generate_transitions again for per-iteration post-warmup writing of `std::vector of parameter/diagnostic values, further per-iteration usage of writers.
  • uses mcmc_writer to write timings (timing should not be mcmc_writer specific, or any writer specific)

file: src/stan/services/util/run_adaptive_sampler.hpp
description: same as non-adaptive sampler

  • calls sampler.write_sampler_state(sample_writer), not sure if non-adaptive sampler includes this.

file: src/stan/services/util/generate_transitions.hpp
description: does further per-iteration calls to writers

  • calls mcmc_writer.write_sample_params to write std::vector
  • calls mcmc_writer.write_diagnostic_params to write std::vector

file: src/stan/services/util/mcmc_writer.hpp
description: class for wrapping writers and logger. This is an MCMC-specific implementation of Bob’s simple-struct approach to wrapping writers, but also has unrelated functionality such as formatting the timing message.

  • mcmc_writer constructor takes logger, sample_writer, and diagnostic_writer as constructors
  • write_sample_names method: writes std::vector<std::string> to sample_writer constructed from sample.get_sample_param_names, sampler.get_sampler_param_names, and model.constrained_param_names(names, true, true)
  • write_sample_params method: writes std::vector<double> to sample_writer, follows same pattern as names for constructing hte vector
  • write_adapt_finish method: writes the string “Adaptation terminated” to sample_writer
  • write_diagnostic_names is as above but uses model.unconstrained_param_names
  • write_diagnostic_params is as above but also uses sampler.get_sampler_diagnostics
  • write_timing munges together the strings with timing info (“Elapsed time”, “warm-up time”, etc…) and writes them to a writer passed in as an argument.
  • log_timing does same but writes to the logger
  • write_timing (different signature) writes timing info to sample_writer, diagnostic_writer, and logger, once

file: src/stan/services/util/initialize
description: HMC initialization

  • writes text to logger re: rejecting initialization b/c problems, “gradient evaluation took…” , “initialization between A and B failed”, etc…
  • writes unconstrained params to init_writer

file: src/stan/variational/advi.hpp
description: multiple ADVI-related functions stuffed into one file.

  • stochastic_gradient_ascent method:
    • uses logger and diagnostic writer
    • writes iter_counter, delta_t, elbo to diagnostic_writer, per-iteration
    • writes MEAN ELBO CONVERGED, MAY BE DIVERGING to logger, (once?)
    • writes informational messages to logger, once
  • run method: uses logger, diagnostic_writer, parameter_writer
    • calls `diagnostic_writer(“iter, time in seconds ELBO”), once?
    • calls parameter_writer("Stepsize adaptation complete, eta = ... , once
    • calls stochastic_gradient_ascent with writers, therein are per-iteration writes (above)
    • writes std::vector<double> of parameter values to parameter_writer, once (final answer).

file: src/stan/mcmc/hmc/hamiltonians/softabs_point.hpp
description: writes out adaptation info

  • writes std::string to writer passed as argument

file: src/stan/mcmc/hmc/hamiltonian/dense_e_point.hpp

  • writes mass matrix (2-D) into a string then passes it to writer, once

file: src/stan/mcmc/hmc/hamiltonian/diag_e_point.hpp

  • writes mass matrix diagonal into a string and passes it to writer

Problems with this code:

  • The writers only have a loosely defined purpose so while MCMC does a decent job of dividing what goes where, I’m not sure there’s any good rationale for where ADVI writes things.
  • mcmc_writer and gq_writer are completely independent and they both bundle the writers (and unrelated functionality).
  • the hamiltonian functions are deciding to write out the mass matrix as strings so none of the interfaces can produce an accurate mass matrix for re-starts.
  • the services don’t control much of the usage of the writers because they just pass them down
  • this approach causes problems for writing interfaces because 1) interfaces don’t control output format; 2) interfaces need to pretend to know the order of calls to figure out when certain text output is coming or interfaces need to parse exact text to retrieve specific output
  • since we produce specific, formatted output as strings that interfaces rely on this introduces a testing burden (this was big for introducing services) because we needed to produce exact matching test output.

Interface: CmdStan

  • uses stan-dev/stan stream_logger and stream_writer

Interface: rstan

  • Implements a comment_writer for just string messages
  • Implements a filtered_values class that is a writer which picks out certain indices from the std::vector<double> that’s passed in, storage is in a rstan::values<InternalVector>

interface: pystan

  • is this all just borrowed rstan code?
  • value ( only for std::vector<double>)
  • values (for std::vector<InternalVector>)
  • filtered_values (like rstan)
  • sum_values (what it sounds like…)
  • comment_writer (rstan)
  • pystan_sample_writer (wraps the others…)


  • code duplication between rstan/pystan that hasn’t made it into stan-dev/stan, fixable with the current design
  • composition of writer classes… not sure why we’re doing this, seems like a mess. I think this is a problem with the current design where the writer concept doesn’t differentiate where a writer will be passed so interfaces differentiate by having classes with only some capabilities (?)
  • rstan’s internal type (R’s list type) has a problem with destructor, probably just change to a standard in-memory writer with an array of double, ncol, and nrow would be fine/faster
  • can’t implement ADVI consistently because the messages are all over the place, this shows up with bugs and limited functionality

What could get better:

  • We could define stages for each algorithm and what output can/will come out of each stage, then whatever the design the interfaces could implement without needing to read the code or examine output.
  • We could bundle writers/loggers and drop gq_writer and mcmc_writer, then the other non-sensical mcmc_writer responsibilities could go to their own classes (e.g.-formatting a timing message could be a free function and the writers could accept a struct that holds timing information.
  • We could have standard writer types: 1) csv; 2) key-value; 3) in-memory; 4) some binary type. We could provide interfaces with helper functions to transfer a standard in-memory format (array + dimensions) to the Python/R/whatever-specific formats. The interfaces would only be required to handle reading from these specific formats (and even then we could help).
  • The interfaces … wouldn’t have to change their config (?)

We could take care of this by having shared callbacks that can be configured at construction (or with a run-time switch) to have null behavior, I don’t see what’s hard here?

Based on how interfaces do things and the idea what we provide shared writer callback I think we can safely punt on the binary format (clean up here first, then do the binary format).

I’d have to run another grep and do more reading. It does get passed into the algorithms but has nothing to do with output so maybe ignore it for now.


More for @Bob_Carpenter : I’m hoping we can push this along far enough to get back to @martinmodrak’s concerns. The one output decision we agreed on was to aggregate the logger and some number of writer types so I’ll assume we have an aggregate called a relay.

In the text below I assume a writer has the ability to a) ignore any given output; b) aggregate any given output; and c) be switched on/off. In an ideal case where there’s no cost to switching output on and off we have:


  • double values representing the time taken for P algorithm stages (warmup, sampling in MCMC).
  • struct holding the config.

Model-file defined:

  • K parameter names, provided once

Model-file defined for sampling:

  • K model (‘constrained’) parameter values, provided per-iteration, so ultimately K \times N, with N iterations.
  • K algorithm (‘unconstrained’) parameter values, provided per-iteration so ultimately K \times N

Model-file defined for optimization:

  • K model (‘constrained’) parameter values, provided once as an estimate
  • K model (‘constrained’) parameter values, provided per-iteration if requested
  • K \times K hessian matrix, provided once at a given point.
  • K gradient parameter values, provided per iteration if requested so ultimately K \times N … not sure that we actually have that plumbed r.n.

Model-file defined for ADVI:

  • K model (‘constrained’) parameter values, provided once as an estimate
  • K model (‘constrained’) parameter values, provided per iteration if requested (so K\times N.
  • K \times K covariance estimate, provided once at a given point.
  • K \times K covariance estimate, provided per-iteration if requested (no plumbed r.n.)

Model-file defined, for HMC:

  • K algorithm (‘unconstrained’) parameter values, provided per iteration if requested so ultimately K \times N
  • K momentum parameter values, provided per iteration if requested so ultimately K \times N
  • K \times K or 1\times K mass matrix, provided on request (at most K\times K \times N).

Model-file defined, for whatever algorithm:

  • K \times M algorithm parameter values holding state per N algorithm steps.
  • K gradient parameter values, provided per iteration if requested so ultimately K \times N (for any gradient-based algorithm).

Algorithm-defined for HMC:

  • iteration, per-iteration
  • adapted stepsize, double, per-iteration during warmup (fixed in sampling), so N_w
  • treedepth (per-iteration), N
  • acceptance rate, N, per-iteration
  • energy, N, per-iteration
  • log-density, N, per-itertaion
  • number of integrator (leapfrog) steps, N, per-iteration
  • divergence, N, per-iteration

Algorithm-defined for BFGS/LBFGS:

  • iteration
  • log-density, N, per-iteration
  • ||dx||, per-iteration
  • ||grad||, per-iteration
  • \alpha, per-iteration
  • \alpha_0, per-iteration
  • number of evals, per-iteration
  • “notes” <- Huh, should be broken out
  • termination/convergence messages

Algorithm-defined, for Newton optimization:

  • … uh, same informational message on initialization failure as MCMC… (?)
  • iteration,
  • log-density, N
  • improvement in the log-density, N

Algorithm-defined, for ADVI:

  • iteration, N
  • ELBO, N
  • delta_ELBO_mean, N
  • delta_ELBO_med, N
  • notes, N
  • informational messages (once?)
  • “time in seconds” for iteration, N
  • ELBO, N
  • “eta” (adapted) , once
  • “drawing a sample of size X from approximate posterior”, once
  • “completed” , once

I went through all the algorithm files (with the help of grep and background on the algorithms and what I’ve seen in interfaces) and this is what we output. For me this is enough to say that we need to handle output of:

Model-file defined:

  • K \times N constrained/unconstrained/algorithm-defined augmented parameters into their own tables (interface gets to say where it goes and uses a standard writer to say if this ends up in a .csv or a {*double, n_col, n_row} structure, or someday a binary stream.
  • K \times K \times N for same (HMC mass matrix, however many times you want to output it, or ADVI fullrank covariance matrix). I’d suggest the same options as above (column-major flat .csv, struct, or someday binary


  • small tuples (Q metrics \times N iterations or sub-iterations where Q is 1-10 in size so not a performance issue). Can still use either .csv output or algorithm-specific in-memory structs (Python and R can read in-memory structs easily) or someday a binary stream.


  • timing information in a std::vector


  • logger messages in text,
  • expand logger to accept single typed output (e.g.- default to-text method but optionally e.g., extra divergence info or such).

That’s what stan-dev/stan will need to implement and provide interface-relevant helpers for.

Relay question:

  • How do we route stuff to these outputs in the relay object?
  • With the three algorithms we have we could honestly just write some classes, one for each of sampling, optimization, and HMC. The specific sub-algorithms (e.g.-HMC with diagonal mass matrix and adaptation) would then just call their specific parts (e.g., relay.send_mass_matrix(std::vector<double> x)).
  • As long as we build these classes up in a modular way we should be able to avoid boilerplate.
  • The dev-side problem @martinmodrak ran into could be solved by deciding that the logger class could have more over-rides with a default conversion to text. If an interface didn’t implement a specific type, it could always get the output as text. If it did implement a specific method the output could arrive in whatever format was needed. Actually, that may be the right approach now too even before this re-organization (sorry Martin, not sure why I didn’t just suggest that earlier, the logger should be plumbed through everywhere already).

Thanks for the thorough reporting, it must have been quite an effort!

I thing that you’ve missed a few things that are and/or should be piped through (all of them probably fit into the proposed output format though):

Model-file defined:

  • generated quantities: Those differ from transformed params in that they can include integers but it might be excusable to convert them to doubles (all 32bit integers can be exactly represented as double). Also, those just have a different dimension than either K or Q.
  • transformed data: Similar to gen quants in that those have arbitrary dimension and may include ints. However those are output only once per-run. This is currently not piped through but should be useful for debugging


  • RNG state. This could fit into other alg-defined params, but wanted to mention it. It also may be necessary to pay attention when/if converting those to doubles. Per-iteration should be enough.

This is very sensible.

1 Like

I started an issue for the refactor to keep track of all related issues in stan and across interfaces:

Most of those are technical details not worth discussing now. The only issue that affects the design IMHO is that we need to be able to determine when gen.quants should be executed (only when stored + force consistency between warmup and sampling if both are stored), so this is something the relay/writers needs to expose.