When discussing possible additional diagnostic outputs from Stan at Getting the location + gradients of divergences (not of iteration starting points), we realized that a better plumbing of outputs from the algorithms is probably a necessary prerequisite. I checked the currently proposed solution for consolidated output and I believe it would not handle this use case very well. So I sketched an alternative proposal - the details are not completely fleshed out, but I hope it is enough to get the rough idea and see if it is worth pursuing further. I have added it to the wiki: https://github.com/stan-dev/stan/wiki/Design:-Consolidated-Output-for-Sample,-Optimize,-ADVI,-and-Diagnose#alternative-proposed-solution
Edit: the wiki is now at https://github.com/stan-dev/stan/wiki/Design:-loggers,-writers,-and-interrupts-in-one-object
-
Re: log4j: in the past weāve discussed this and catalogued some of the libraries involved but they didnāt seem like they were worth the added dependency burden. In C++ the burden of adding dependencies ends up being a little higher than in something like R/Python and their functionality (for a logging library) doesnāt mean much to us since most of the work is here (defining outputs, defining integration into C++ code base, defining how interfaces hook in) and most of that isnāt actual coding.
-
Overall I support the goal of making it easier to add different types of output without modifying (much) of the IO design and without adding ad-hoc streams (like we would currently need to). One part of your design that makes me concerned about the maintenance is that it has to deal with generic tables explicitly. I think this bit may not be necessary:
template<class OutputStreams>
class table_consumer {
//Header defining functions, probably more combinations of type/arity needed
virtual void add_double_column(const string& name) = 0;
virtual void add_double_columns(const vector<string>& names) = 0;
virtual void add_int_column(const string& name) = 0;
virtual void add_string_column(const string& name) = 0;
virtual void header_end() = 0;
virtual void void row_start(const OutputStreams::indexing& index) = 0;
virtual void consume_double(double value) = 0;
virtual void consume_doubles(const std::vector<double>& values) = 0;
virtual void consume_doubles(const Eigen::VectorXd& values) = 0;
virtual void consume_int(int value) = 0;
...
virtual void row_end() = 0;
virtual void close() = 0;
}
- I think this is meant to handle things like the combined
stepsize__
anddivergent__
parameters that have different types. For those we could use tuples (s.t. everything is static and simple) or a templated class like the above but with compile-time resolution of collumn types (fancier use of parameter packs but it should work out and avoid run-time output bugs). For larger outputs (e.g.-mass matrix) my thought was to wrap them in small classes and use overloads or templates to resolve exactly which output stream gets the item. For smaller things (e.g.- the 9 or so algorithm-parameters that NUTS might spit out) it probably makes more sense to pack a tuple and write that.
Re: streams within transitions: it might make sense to make these switches static s.t. thereās no penalty incurred unless a debug
flag is set on compilation. For larger models I donāt think an āifā would matter so thatās worth testing⦠The reason Iām suggesting making more of this design static is that if youāre adding outputs you can edit a few lines of code yourself (if we make it easy!), so the add_*_column code below is not necessary.
relay.get_key_value_consumer(hmc_output_streams::algorithm_params).consume_value("adapt_delta", delta);
relay.get_key_value_consumer(hmc_output_streams::algorithm_params).close();
...
table_consumer<hmc_output_streams::index> iterations_consumer = relay.get_table_consumer(hmc_output_streams::iterations);
iterations_consumer.add_double_columns(<<model param names>>);
iterations_consumer.add_double_column("energy__");
iterations_consumer.add_int_column("divergent__");
iterations_consumer.header_end();
-
Re: simpler signatures: this could be (on relay/consumer construction):
consumer<std::vector<double>, double, double, int>(std::ostream& samples, std::ostream& algorithm_parameters);
(the std::ostream is just an example of where output might go). Then at sampler setup:consumer.set_names(std::string names)
, and this could be checked for the expected number of names. -
Then the sampler has to call
consumer(parameter_values, sampler.energy, sampler.divergent)
rather than individually adding things. With C++11 and parameter packs this ends up not even resulting in hard-coding anything. With the suggestions Iām making if you want to add an output type, you would: 1) change the tuple that describes what the consumer handles; 2) include it in the calls to the consumer; and if necessary 3) modify the interface hook for how to handle the additional output.
I was hoping Iād get a chance to write this down in more detail over the weekend but I didnāt so you get the hand-wavy version :/
Good point with the templates/parameter pack magic. I didnāt want to go into it because I am a bit out of my depth there, but maybe itās time to increase my C++ fu :-) How hard can it be anyways? :-D.
Also I think that maybe the per-iteration data are just so frequently used that they may be worth some special care, e.g. treating them as a separate type of stream. The motivation is that I think it should be easy to turn individual streams of the per-iteration data (gen quants, unconstrained params, RNG state, ā¦) on and off per run AND still pipe everything directly into a single CSV file. This would require fixed ordering of consume
calls, but that seems manageable. Whereas for other types of output it is IMHO OK that every stream ends up in its own file/socket/⦠and repeats header columns.
I am not very afraid of the dynamic if
clauses - they are runtime constant and will likely get ~100% correct branch prediction. Having to recompile model to get different output seems annoying (and you canāt compile all variants as there is a combinatorial explosion of possibilities). Also everyone keeps saying the the main bottleneck is the autodiff stack - I didnāt check that myself, but if this is the case a single if
should not be noticeable.
Good that someone did that. I guess rolling out our own wonāt be that crazy. My current plan would be to ignore the logging for a first implementation pass, keeping the current logger, and - if everything works out - to implement logging on top of the table output stream.
Iāve done a fair number of demo projects with it at this point so I feel pretty confident about it. If I donāt write out signatures I sometimes miss things that canāt work and assume they can :/ ā¦
Iād like to design this so that thereās a relay
object that gets used within Stan, but that itās parameterized (on construction with objects or statically using templates) so that the interfaces are responsible for only the code that gets plugged into the relay. We could write some standard relay plug-ins that write to .csv files or std::stream or whatever s.t. any interface could use them. I think those plug-ins would be the place to have, e.g., some generic implementations for per-iteration data.
Fair enough, I think sometimes they can cause code-bloat that messes with inlining and caches (I think Stan has cache-size problems, from another trhead) but I admit that I donāt know that part of the code base well enough and in any case we could do some timings to verify your very reasonable suggestion.
When I looked at the logging design my main thought was that we could make it usable with whatās currently under discussion by adding some signatures, these things donāt have to be rolled into one framework and keeping them modular has benefits too. We just need to push this along and see what it looks like further along the line.
I updated the wiki, hopefully in a way that integrates the work so far and separates this out from the other output problema (file format stuff)
So I had a go at actually implementing the necessary template magic. And I had fun. Also it was a throwback to college courses in Prolog :-D I currently think that what we aim for is mostly possible, with some minor modifications.
The key-value output seems easy, so I focused on the hetergenous tabular output (homogenous tables could be made directly with the same code or with very slightly specialized version). I however have no idea how to be able to have relay.table<handler_type>(T1 v1, T2 v2, ...);
, that is how to make handler_type
decide the number of parameters. However, you can make do with relay.table<handler_type>({T1 v1, T2 v2, ...});
(added braces), so that seems OK.
The code I have now is below. You can also try it online and see it with synatx highlighting.
#include <iostream>
#include <vector>
#include <tuple>
#include <type_traits>
#include <cstddef>
#include <exception>
#include <string>
#include <algorithm>
/**************************************
* Defining the output types
**************************************/
//Helper for output types with fixed headers
struct no_header {
static no_header nh;
};
no_header no_header::nh;
struct hmc_iter_stats {
double energy;
int treedepth;
bool divergent;
typedef no_header header_def;
};
struct constrained_params {
std::vector<double> values;
typedef std::vector<std::string> header_def;
};
/**************************************
* data_stream - ancestor of all streams of data
**************************************/
//I require all template parameters to be class so that
//a different name -> different type
template <class T> class data_stream {
public:
typedef T output;
// Couldn't figure out the syntax with typedef
using header_def = typename T::header_def;
virtual void send_headers(const header_def &header) = 0;
virtual void send(const T &data) = 0;
};
/**************************************
* Some utility templates to be able define the relay
**************************************/
template<class T> struct always_false : std::false_type {};
//Resolves to the first index of What in the paremter pack Where
template <typename What, typename... Where>
struct index_of;
//Case 0: type not found, providing at least a little meaningful error message
template <typename What>
struct index_of<What> {
static_assert(always_false<What>::value, "The required type is not part of the type list");
};
//Case 1: Found the type
template <typename What, typename... Where>
struct index_of<What, What, Where...> : std::integral_constant<std::size_t, 0> {};
//Case 2: Recursion
template <typename What, typename Where_head, typename... Where_tail>
struct index_of<What, Where_head, Where_tail...> : std::integral_constant<std::size_t, 1 + index_of<What, Where_tail...>::value> {};
/**************************************
* The relay and its instances
**************************************/
template <class... Outputs>
class relay {
std::tuple<data_stream<Outputs>*...> streams;
template<class T> data_stream<T>*& stream_ref() {
std::get<index_of<T,Outputs...>::value>(streams);
}
public:
template<class T> data_stream<T>* get_stream() {
stream_ref<T>();
}
template<class T> void set_stream(data_stream<T>* stream) {
if(stream_ref<T>() != 0) {
throw std::invalid_argument("Stream already set.");
}
stream_ref<T>() = stream;
}
template<class T> void send_headers(const typename T::header_def& header) {
if(stream_ref<T>() != 0) {
stream_ref<T>()->send_headers(header);
}
}
template<class T> void send(const T& data) {
if(stream_ref<T>() != 0) {
stream_ref<T>()->send(data);
}
}
};
typedef relay<hmc_iter_stats, constrained_params> hmc_relay;
/**************************************
* Some data_stream implementations
**************************************/
//Helper for multiple types of possible outputs
template<class T>
class to_string_columns;
template<>
class to_string_columns<hmc_iter_stats> {
public:
static void get_headers(const hmc_iter_stats::header_def&, std::vector<std::string>& out) {
out.push_back("energy");
out.push_back("treedepth");
out.push_back("divergent");
}
static void get_values(const hmc_iter_stats& data, std::vector<std::string>& out) {
out.push_back(std::to_string(data.energy));
out.push_back(std::to_string(data.treedepth));
out.push_back(std::to_string(data.divergent));
}
};
template<>
class to_string_columns<constrained_params> {
public:
static void get_headers(const constrained_params::header_def& header, std::vector<std::string>& out) {
out.insert(out.end(), header.begin(), header.end());
}
static void get_values(const constrained_params& data, std::vector<std::string>& out) {
std::for_each(data.values.begin(), data.values.end(), [&out](double val) {
out.push_back(std::to_string(val));
});
}
};
template<class T>
class csv_stream : public data_stream<T> {
private:
std::ostream& out;
std::vector<std::string> buffer;
public:
csv_stream(std::ostream& out) : out(out), buffer() {}
virtual void send_headers(const typename T::header_def &header) override {
buffer.clear();
to_string_columns<T>::get_headers(header, buffer);
std::for_each(buffer.begin(), buffer.end(), [this](std::string& s) {
out << s << ',';
});
out << std::endl;
}
virtual void send(const T &data) override {
buffer.clear();
to_string_columns<T>::get_values(data, buffer);
std::for_each(buffer.begin(), buffer.end(), [this](std::string& s) {
out << s << ',';
});
out << std::endl;
}
};
int main() {
hmc_relay r;
csv_stream<constrained_params> params_stream(std::cout);
r.set_stream<constrained_params>(¶ms_stream);
csv_stream<hmc_iter_stats> stats_stream(std::cout);
r.set_stream<hmc_iter_stats>(&stats_stream);
std::vector<std::string> param_names;
param_names.push_back("alpha");
param_names.push_back("beta");
r.send_headers<constrained_params>(param_names);
r.send_headers<hmc_iter_stats>(no_header::nh);
std::vector<double> param_values;
param_values.push_back(0.1);
param_values.push_back(0.2);
r.send<constrained_params>({param_values});
r.send<hmc_iter_stats>({0.5, 1, true});
/* Uncomment to see error when stream is not defined
struct undefined_output {
typedef no_header header_def;
};
r.send_headers<undefined_output>(no_header::nh);
*/
}
Thatās awesome, thanks! Iāll read through this today and see if I can ditch the curly brackets.
While Iāll be happy to learn how to this, I am no longer 100% sure getting rid of the curly braces is desirable. Having an explicit data type that represents the table row seems useful, especially if you cannot collect all the values at once. Also it might make sense for the interfaces, as I guess it might be possible to just directly access this C struct from the interface without any serialization.
Iām very much behind consolidating argument groups into structures (aka āparameter objectsā). I created a separate topic that provides some context to the top-level decision being discussed here to bundle parameters.
Putting on the forum moderator hat for a sec.:
- The writers/loggers/interrupts are different because the interfaces have to specify a whole set of behaviors when building a ārelayā for the loggers whereas for most other items the interface just supplies a few int/double values.
- I agree the high-level discussion of the builder pattern is on-topic here, we should share a builder pattern.
- Detailed discussion about which args belong in which bundle should be in its own thread or (ideally) a design doc on the Wiki
var_context
also brings up a host of unrelated issues (including performance IIRC) and belongs in thevar_context
thread: Var_context, what do we want from it?
- Removed the slightly derogatory ābike-sheddingā, I didnāt mean it like that.
I put this on the already existing Wiki and organized it a little: Home Ā· stan-dev/stan Wiki Ā· GitHub, that seems like a good place to put implementation suggestions.
Iām sure we could make that happen, though maybe in terms of order it goes after removing the transforms.
I updated the Wiki: https://github.com/stan-dev/stan/wiki/Design:-loggers,-writers,-and-interrupts-in-one-object
I also re-organized @martinmodrakās code s.t. the separation of responsibilities I expect to occur in our repos is clearer: https://repl.it/@KrzysztofSakrej/TidyExtrasmallProlog-1
OK, Iāll create a separate issue and page for this. I just brought it up here to try to put the design in the context of the goal to bundle parameter arguments.
I donāt understand yet what the relays are going to do functionally other than act as simple structs to hold the relevant writers.
Iāve been thinking a bit about what Dan and Bob said on the meeting today and I see a less elegant, but minimalist implementation:
- Create a relay class that contains all the
writer
s the algorithm supports. This would be a different class per algorithm group, possible some structure via inheritance. This avoids changes to code when a newwriter
is added (e.g for more info on divergent transitions). - The relay class also stores information about which elements for individual writers are enabled, to allow more fine-grained control (e.g. whether diagnostic outputs contains momenta, RNG states, etc.)
- Create a new class
key_value_writer
that would supplant passing strings to the current writers (AFAIK these could all be plausible considered key-value pairs like alg. name, version, initial seed, etc.). Add this to the relay.
Downsides:
- Weāve lost some safety in the code, as all complex output is constructed by appending individual values.
- We have to convert everything to double
- Code for the algorithms is contaminated with non-trivial amounts of output specific code: the column names for
lp__
,divergent__
, sampler params etc., checking which data should be sent (RNG state, ā¦), etc. - It is hard to disentangle the output back into its individual elements on the C++ side, e.g. for row filtering or to provide more structured output for the interfaces (e.g., separating model params from sampler info)
- To determine what kind of data and in what order the algorithm actually outputs, you have to deep dive into algorithm code
I also believe that there is a spectrum between this minimalist implementation and the full blown templated beast, i.e. resolving only some of the downsides might be possible with a smaller increase in complexity.
Just FYI re: the meeting. Iāll write what Bob requested this weekend, I think we can outline what the implementation would look like at higher level for the minimal and the more complex solutions.
Minimalism and elegance, while both on the nice-to-have list, are not strictly necessary.
What we really need to prioritize is maintenance, by which I mean testing, documentation, and extensions in the future. At least for this kind of pass-through code that shouldnāt have any performance impact.
What Iād like to do is expand our focus to a full 360 around the services API. The interfaces are the clients of the service API, so itās their interest we most need to keep in mind from a design perspective. Drilling down a level, the algorithms are clients of the services API:
interfaces āclientofā> services āclientofā> algorithms
The callback structure inverts the interface-services relationship, so that the relation to the callbacks is:
interfaces āimplementā> callbacks <ācallā algorithms
What Iād like to see first is an inventory of
- whatās currently being written by th algorithms to the callbacks
- by type: string, integer, double, vector, etc.,
- by service: NUTS adapt diagonal, ADVI dense, etc.,
- by frequency: once at data, once at adaptation, warmup, sampling, ad hoc, multiple per iteration, per iteration, etc., and
- how the interfaces currently implement the callbacks
- duplicated code
- pain points for integration, testig or maintenace
- hacks like using strings to encode adaptation
Then Iād like to know
- what new things will this enable to be easily written (that is, whatās next)
- how does this impact config on the part of the interfaces
- how interfaces can better share implementations with a new design.
Some wish list items of mine would be something that makes it easier to turn off output other than the draws (RStan and PyStan both struggle with this, I think). This might be as easy as defining null writers that get plugged in as implementations by the callbacks.
This then gets wrapped up into selling points for the new interfaces focusing on maintainability and extensiblity, and then we build it. By which I mean I hope you all build it!
Serialization is a very closely related concern. Exactly what is being written here is going to wind up being the target of serialization and deserialization. It might be useful as such to consider serialization an addition client in addition to the interfaces (though such a serialization would be very closely related to what CmdStan does).
The current thinking is to define a protocol buffer format for these streams that could implement the callbacks from an interface perspective. Then the analysis API and interfaces could read it back in.
Iām still not clear on where the interrupts fit into all of this. Are they part of the design?
Here are all cases where a writer
is called directly, there are additional files where a writer
is just passed down.
file: src/stan/model/test_gradients.hpp
description: This is writing only a table, and some string messages, each row of the talbe is a parameter
usage:
logger
andparameter_writer
get messagestring
fromlog_prob_grad
calllogger
andparameter_writer
get messagestring
fromfinite_diff_grad
calllogger
andparameter_writer
get header (āparam_idxā, āvalueā, āmodelā, āfinite diffā, āerrorā) as a stringlogger
andparameter_writer
get gradient test infostring
(k
,params_r[k]
,grad[k]
,grad_fd[k]
, and(grad_[k] - grad_fd[k])
file: src/stan/services/experimental/advi/meanfield.hpp
,
description: this services method only sets up the header for parameter_writer
and calls another method (passing writer
)
usage:
parameter_writer
getslp__
, followed bymodel.constrained_param_names(names, true, true);
file: src/stan/services/experimental/advi/fullrank.hpp
,
description: as above, calls initialize
, sets up header, calls something from variational::advi
usage: as above.
file: src/stan/services/sample/fixed_param.hpp
description: No iterations here, things written once.
usage:
- util::mcmc_writer gets
sample_writer
,diagnostic_writer
,logger
) this is relevant b/c any newrelay
is like themcmc_writer
mcmc_writer
writes header of sampler, writes header of diagnostic filewriter
passed toutil::generate_transitions
- timing written to
writer
file: src/stan/services/sample/standalone_gqs.hpp
description: not much done here since the main action is within gq_writer
usage:
util:gq_writer
constructed fromsample_writer
,logger
, andnum_params
, this is another ad-hoc thing likemcmc_writer
that groups otherwriter_callback
writerswriter.write_gq_names(model)
called oncewriter.write_gq_values(mode, rng, draw)
called per iteration
file: src/stan/services/optimize/bfgs.hpp
description: thereās a switch that decides if the per-iteration writing occurs.
usage:
parameter_writer
called withstd::vector<std::string>
parameter names, onceparameter_writer
called with std::vector parameter values, once- per iteration
parameter_writer
called with std::vector parameter values parameter_writer
called with std::vector parameter values, once
file: src/stan/services/optimize/lbfgs.hpp
description: lots of boilerplate, like bfgs
usage:
- called like above for
bfgs
file: src/stan/services/optimize/newton.hpp
description: like above
usage:
- called like
bfgs/lbfgs
file: src/stan/services/util/gq_writer.hpp
description: ad-hoc class created to add the standalone genearted quantities calls, bundles the writers, lots of overlap with mcmc_writer
class.
usage:
sample_writer
called to writegq_names
(std::vector<std::string>
)sample_writer
called to writegq_values
(std::vector<double>
)- uses
model.write_array
file: src/stan/services/util/run_sampler.hpp
description sets up headers then uses mcmc_writer
to wrap writers
usage:
- writes
std::vector<std::string>
for parameter names once - writes
std::vector<std::string>
for diagnostic names once - passes
mcmc_writer
(and therefore all the writers) intoutil::generate_transitions
for per-iteration writing of `std::vector of parameter/diagnostic values - calls
mcmc_writer::write_adapt_finish
to write post-adaptation to writers (mass matrix, once - passes
mcmc_writer
(and therefore all the writers) intoutil::generate_transitions
again for per-iteration post-warmup writing of `std::vector of parameter/diagnostic values, further per-iteration usage of writers. - uses
mcmc_writer
to write timings (timing should not be mcmc_writer specific, or any writer specific)
file: src/stan/services/util/run_adaptive_sampler.hpp
description: same as non-adaptive sampler
usage:
- calls
sampler.write_sampler_state(sample_writer)
, not sure if non-adaptive sampler includes this.
file: src/stan/services/util/generate_transitions.hpp
description: does further per-iteration calls to writer
s
usage:
- calls
mcmc_writer.write_sample_params
to write std::vector - calls
mcmc_writer.write_diagnostic_params
to write std::vector
file: src/stan/services/util/mcmc_writer.hpp
description: class for wrapping writers
and logger
. This is an MCMC-specific implementation of Bobās simple-struct approach to wrapping writers, but also has unrelated functionality such as formatting the timing message.
usage:
mcmc_writer
constructor takeslogger
,sample_writer
, anddiagnostic_writer
as constructorswrite_sample_names
method: writesstd::vector<std::string>
tosample_writer
constructed fromsample.get_sample_param_names
,sampler.get_sampler_param_names
, andmodel.constrained_param_names(names, true, true)
write_sample_params
method: writesstd::vector<double>
tosample_writer
, follows same pattern as names for constructing hte vectorwrite_adapt_finish
method: writes the string āAdaptation terminatedā tosample_writer
write_diagnostic_names
is as above but usesmodel.unconstrained_param_names
write_diagnostic_params
is as above but also usessampler.get_sampler_diagnostics
write_timing
munges together the strings with timing info (āElapsed timeā, āwarm-up timeā, etcā¦) and writes them to a writer passed in as an argument.log_timing
does same but writes to the loggerwrite_timing
(different signature) writes timing info tosample_writer
,diagnostic_writer
, andlogger
, once
file: src/stan/services/util/initialize
description: HMC initialization
usage:
- writes text to logger re: rejecting initialization b/c problems, āgradient evaluation tookā¦ā , āinitialization between A and B failedā, etcā¦
- writes unconstrained params to
init_writer
file: src/stan/variational/advi.hpp
description: multiple ADVI-related functions stuffed into one file.
usage:
stochastic_gradient_ascent
method:- uses
logger
anddiagnostic writer
- writes
iter_counter
,delta_t
,elbo
todiagnostic_writer
, per-iteration - writes
MEAN ELBO CONVERGED
,MAY BE DIVERGING
tologger
, (once?) - writes informational messages to
logger
, once
- uses
run
method: useslogger
,diagnostic_writer
,parameter_writer
- calls `diagnostic_writer(āiter, time in seconds ELBOā), once?
- calls
parameter_writer("Stepsize adaptation complete, eta = ...
, once - calls
stochastic_gradient_ascent
with writers, therein are per-iteration writes (above) - writes
std::vector<double>
of parameter values toparameter_writer
, once (final answer).
file: src/stan/mcmc/hmc/hamiltonians/softabs_point.hpp
description: writes out adaptation info
usage:
- writes
std::string
towriter
passed as argument
file: src/stan/mcmc/hmc/hamiltonian/dense_e_point.hpp
usage:
- writes mass matrix (2-D) into a string then passes it to writer, once
file: src/stan/mcmc/hmc/hamiltonian/diag_e_point.hpp
usage:
- writes mass matrix diagonal into a string and passes it to writer
Problems with this code:
- The writers only have a loosely defined purpose so while MCMC does a decent job of dividing what goes where, Iām not sure thereās any good rationale for where ADVI writes things.
mcmc_writer
andgq_writer
are completely independent and they both bundle the writers (and unrelated functionality).- the
hamiltonian
functions are deciding to write out the mass matrix as strings so none of the interfaces can produce an accurate mass matrix for re-starts. - the
services
donāt control much of the usage of the writers because they just pass them down - this approach causes problems for writing interfaces because 1) interfaces donāt control output format; 2) interfaces need to pretend to know the order of calls to figure out when certain text output is coming or interfaces need to parse exact text to retrieve specific output
- since we produce specific, formatted output as strings that interfaces rely on this introduces a testing burden (this was big for introducing services) because we needed to produce exact matching test output.
Interface: CmdStan
writers:
- uses
stan-dev/stan
stream_logger
andstream_writer
Interface: rstan
writers:
- Implements a
comment_writer
for just string messages - Implements a
filtered_values
class that is awriter
which picks out certain indices from thestd::vector<double>
thatās passed in, storage is in arstan::values<InternalVector>
interface: pystan
writers:
- is this all just borrowed rstan code?
value
( only forstd::vector<double>
)values
(forstd::vector<InternalVector>
)filtered_values
(likerstan
)sum_values
(what it sounds likeā¦)comment_writer
(rstan)pystan_sample_writer
(wraps the othersā¦)
problems:
- code duplication between rstan/pystan that hasnāt made it into
stan-dev/stan
, fixable with the current design - composition of
writer
classes⦠not sure why weāre doing this, seems like a mess. I think this is a problem with the current design where thewriter
concept doesnāt differentiate where a writer will be passed so interfaces differentiate by having classes with only some capabilities (?) - rstanās internal type (Rās list type) has a problem with destructor, probably just change to a standard in-memory writer with an array of
double
,ncol
, andnrow
would be fine/faster - canāt implement ADVI consistently because the messages are all over the place, this shows up with bugs and limited functionality
What could get better:
- We could define stages for each algorithm and what output can/will come out of each stage, then whatever the design the interfaces could implement without needing to read the code or examine output.
- We could bundle writers/loggers and drop
gq_writer
andmcmc_writer
, then the other non-sensicalmcmc_writer
responsibilities could go to their own classes (e.g.-formatting a timing message could be a free function and the writers could accept a struct that holds timing information. - We could have standard writer types: 1) csv; 2) key-value; 3) in-memory; 4) some binary type. We could provide interfaces with helper functions to transfer a standard in-memory format (array + dimensions) to the Python/R/whatever-specific formats. The interfaces would only be required to handle reading from these specific formats (and even then we could help).
- The interfaces ⦠wouldnāt have to change their config (?)
We could take care of this by having shared callbacks that can be configured at construction (or with a run-time switch) to have null behavior, I donāt see whatās hard here?
Based on how interfaces do things and the idea what we provide shared writer callback I think we can safely punt on the binary format (clean up here first, then do the binary format).
Iād have to run another grep and do more reading. It does get passed into the algorithms but has nothing to do with output so maybe ignore it for now.
More for @Bob_Carpenter : Iām hoping we can push this along far enough to get back to @martinmodrakās concerns. The one output decision we agreed on was to aggregate the logger
and some number of writer
types so Iāll assume we have an aggregate called a relay
.
In the text below I assume a writer
has the ability to a) ignore any given output; b) aggregate any given output; and c) be switched on/off. In an ideal case where thereās no cost to switching output on and off we have:
Services-defined:
-
double
values representing the time taken for P algorithm stages (warmup, sampling in MCMC). -
struct
holding the config.
Model-file defined:
- K parameter names, provided once
Model-file defined for sampling:
- K model (āconstrainedā) parameter values, provided per-iteration, so ultimately K \times N, with N iterations.
- K algorithm (āunconstrainedā) parameter values, provided per-iteration so ultimately K \times N
Model-file defined for optimization:
- K model (āconstrainedā) parameter values, provided once as an estimate
- K model (āconstrainedā) parameter values, provided per-iteration if requested
- K \times K hessian matrix, provided once at a given point.
- K gradient parameter values, provided per iteration if requested so ultimately K \times N ⦠not sure that we actually have that plumbed r.n.
Model-file defined for ADVI:
- K model (āconstrainedā) parameter values, provided once as an estimate
- K model (āconstrainedā) parameter values, provided per iteration if requested (so K\times N.
- K \times K covariance estimate, provided once at a given point.
- K \times K covariance estimate, provided per-iteration if requested (no plumbed r.n.)
Model-file defined, for HMC:
- K algorithm (āunconstrainedā) parameter values, provided per iteration if requested so ultimately K \times N
- K momentum parameter values, provided per iteration if requested so ultimately K \times N
- K \times K or 1\times K mass matrix, provided on request (at most K\times K \times N).
Model-file defined, for whatever algorithm:
- K \times M algorithm parameter values holding state per N algorithm steps.
- K gradient parameter values, provided per iteration if requested so ultimately K \times N (for any gradient-based algorithm).
Algorithm-defined for HMC:
- iteration, per-iteration
- adapted stepsize,
double
, per-iteration during warmup (fixed in sampling), so N_w - treedepth (per-iteration), N
- acceptance rate, N, per-iteration
- energy, N, per-iteration
- log-density, N, per-itertaion
- number of integrator (leapfrog) steps, N, per-iteration
- divergence, N, per-iteration
Algorithm-defined for BFGS/LBFGS:
- iteration
- log-density, N, per-iteration
- ||dx||, per-iteration
- ||grad||, per-iteration
- \alpha, per-iteration
- \alpha_0, per-iteration
- number of evals, per-iteration
- ānotesā <- Huh, should be broken out
- termination/convergence messages
Algorithm-defined, for Newton optimization:
- ⦠uh, same informational message on initialization failure as MCMC⦠(?)
- iteration,
- log-density, N
- improvement in the log-density, N
Algorithm-defined, for ADVI:
- iteration, N
- ELBO, N
- delta_ELBO_mean, N
- delta_ELBO_med, N
- notes, N
- āMEAN ELBO CONVERGEDā (once)
- āMEDIAN ELBO CONVERGEDā (once)
- āMAY BE DIVERGING⦠INSEPCT ELBOā (once?)
- informational messages (once?)
- ātime in secondsā for iteration, N
- ELBO, N
- āetaā (adapted) , once
- ādrawing a sample of size X from approximate posteriorā, once
- ācompletedā , once
Solutions:
I went through all the algorithm files (with the help of grep and background on the algorithms and what Iāve seen in interfaces) and this is what we output. For me this is enough to say that we need to handle output of:
Model-file defined:
-
K \times N constrained/unconstrained/algorithm-defined augmented parameters into their own tables (interface gets to say where it goes and uses a standard writer to say if this ends up in a .csv or a
{*double, n_col, n_row}
structure, or someday a binary stream. - K \times K \times N for same (HMC mass matrix, however many times you want to output it, or ADVI fullrank covariance matrix). Iād suggest the same options as above (column-major flat .csv, struct, or someday binary
Algorithm-defined:
- small tuples (Q metrics \times N iterations or sub-iterations where Q is 1-10 in size so not a performance issue). Can still use either .csv output or algorithm-specific in-memory structs (Python and R can read in-memory structs easily) or someday a binary stream.
Services-defined
- timing information in a std::vector
Everywhere:
- logger messages in text,
- expand logger to accept single typed output (e.g.- default to-text method but optionally e.g., extra divergence info or such).
Thatās what stan-dev/stan
will need to implement and provide interface-relevant helpers for.
Relay question:
- How do we route stuff to these outputs in the
relay
object? - With the three algorithms we have we could honestly just write some classes, one for each of sampling, optimization, and HMC. The specific sub-algorithms (e.g.-HMC with diagonal mass matrix and adaptation) would then just call their specific parts (e.g.,
relay.send_mass_matrix(std::vector<double> x)
). - As long as we build these classes up in a modular way we should be able to avoid boilerplate.
- The dev-side problem @martinmodrak ran into could be solved by deciding that the
logger
class could have more over-rides with a default conversion to text. If an interface didnāt implement a specific type, it could always get the output as text. If it did implement a specific method the output could arrive in whatever format was needed. Actually, that may be the right approach now too even before this re-organization (sorry Martin, not sure why I didnāt just suggest that earlier, the logger should be plumbed through everywhere already).
Thanks for the thorough reporting, it must have been quite an effort!
I thing that youāve missed a few things that are and/or should be piped through (all of them probably fit into the proposed output format though):
Model-file defined:
- generated quantities: Those differ from transformed params in that they can include integers but it might be excusable to convert them to doubles (all 32bit integers can be exactly represented as double). Also, those just have a different dimension than either K or Q.
- transformed data: Similar to gen quants in that those have arbitrary dimension and may include ints. However those are output only once per-run. This is currently not piped through but should be useful for debugging
Algorithm-defined:
- RNG state. This could fit into other alg-defined params, but wanted to mention it. It also may be necessary to pay attention when/if converting those to doubles. Per-iteration should be enough.
This is very sensible.
I started an issue for the refactor to keep track of all related issues in stan and across interfaces: https://github.com/stan-dev/stan/issues/2534
Most of those are technical details not worth discussing now. The only issue that affects the design IMHO is that we need to be able to determine when gen.quants should be executed (only when stored + force consistency between warmup and sampling if both are stored), so this is something the relay/writers needs to expose.