Universal, static, logger-style output

Rather than hijacking the original topic, I’m opening a new one to discuss a radical rethinking of how we can handle output that @seantalts and I have been discussing. We brought it up briefly during this week’s Stan meeting, but I thought I’d write up what we’ve been thinking about.

Comments obviously welcome—that’s why it’s up here!

Current services output

The interfaces talk to Stan through the Stan services methods, which are in the stan-dev/stan repo in directory src/stan/services. There are about 20 such methods including HMC variants (NUTS/static, adaptation/not, metric shape), ADVI variants (dense and diagonal covariance), and optimization variants (L-BFGS, BFGS, Newton). Each of these 20 or so methods takes a slightly different set of callbacks, though there are a lot of commonalities.

Callbacks of this form impose a C++ burden on new implementations. The goal of the discussion in the original topic was to consolidate these C++ callback writers into some kind of object so it would be easier to add new ones without breaking backward compatibility.

This is difficult to do with appropriate type checking, etc., and the proposal has been stalled since @martinmodrak produced an original design for consolidated output and @sakrejda produced the excellent overview of what kind of I/O we’re currently doing.

Logging and other message output

We want to move our general console output to more of a logger-based style, separating debug output, informational output, warning output, and error output. We haven’t made any progress doing this, either.

Combo proposal

We can fix everything at once in a way that is embarassingly forward compatible.

  • Remove the output callback arguments (nope, no replacement!)
  • Define a global, static logger
  • Replace calls to current callback functions with messages written to the global logger

Thread safety for multi-threaded log density evals will require a synchronized FIFO message queue to be implemented for the logger, but that only needs to be coded once, and there won’t be a huge amount of contention. And we have threading support in C++11 now.

The global logger will need to be able to handle a range of line-based message types, but each will be of the simple form:

<TAG> value1, ..., valueN

The values can be strings, booleans, integers, reals, vectors, and matrices.

For example, an informational log message could be written as

INFO "Adaptation terminated normally. Beginning sampling."

and a warning message could be:

WARN "Rejected proposed parameter value; normal_lpdf scale parameter was -1.2 and must be positive"

Our header, warmup draws, and actual draws can be:

PARAMETER-NAMES, "mu", "sigma"
...
WARMUP-DRAW, 1, 2, -3.976542874635, 1.545237485639
...
POSTERIOR-DRAW, 1, 1285, -3.222184747564, 2.974526204756

The draws have have a chain ID (and redundant iteration ID, to make things easier on the client side).

The warmup adapted metric can be something like this, when you adapt it on iteration 40, say,

ADAPTED-METRIC 40, 0.59815757, 1.4333337656535

Whether it’s a matrix or vector is obvious because you know the parameters at this point.

This is just a sketch—don’t treat it as a serious proposal.

Cons

  • slower to read a human-readable ASCII output line and convert it to a double-precision floating point value than to read out of the callbacks

  • routing logic of messages to output sources remains up to the interfaces

  • we need to rebuild a bunch of things in CmdStan, RStan, and PyStan to get back to where we are now

Pros

  • everything is ordered within chain but asynchronous across chains, so don’t need to worry about output periods (per iteration, multiple per iteration, intermittent, etc.) By design, everything’s intermittent by construction and only convention makes it more regular.

  • no longer a type puzzle to figure out where to add which kinds of output to minimize effort across service functions

  • no longer a custom pipefitting C++ interface job for the interfaces to deal with new kinds of output

  • forward compatibility is easy because interfaces can ignore output they don’t care about

New output example

As an example of how simple new output would be, consider adding trajectory information—it’d have a tag, chain ID, iteration ID, and trajectory iteration ID (may be confusing with NUTS going backwards and forwards in time!); again, we know the parameter names before we get a trajectory.

TRAJECTORY 1, 172, 1, 0.37459864683, 1.002347563523
TRAJECTORY 1, 172, 2, 0.426747582, 1.106353516464
TRAJECTORY 1, 172, 3, 0.5576465755955, 1.0776464628495

and divergences get chain IDs, iteration IDs, and position and momentum vectors, for example:

DIVERGENCE 3, 1204, <position>, <momentum>

Alternative

Instead of a human-readable, line-based UTF-8 format, we could use protocol buffers. The most recent spec allows streaming output with the kinds of structure we need. It looks like Rprotobuf (it’s part of the Rcpp suite) is a pretty full interface and I’d guess Python would have excellent protobuf support given its use at Google.

Just a Sketch!

Remember, this is just a sketch. The real tags and message formats need to be designed.

Mostly it stalled because I’ve been on vacation for three weeks.

My suggestion is that you take a look at the two issues I have open in Stan-dev/stan (they include proposals for how output should be handled as well as doc of current usage).

Your sketch fits in with those issues as a very generic interface-side implementation. This is exactly what I tried to explain when i said that we could provide generic implementations shared among interfaces.

What we don’t need is to force a static god object for handling all output.

K

Will do.

Great. Thanks for writing back.

Static writers are the way loggers are usually handled.

This wouldn’t be a “god object” in the usual sense of being a dumping ground for heterogeneous components—it’s just something to write tags and a few string and numerical types wrapped in a message queue.

Also, didn’t mean to imply it wouldn’t be great if @sakrejda and/or @martinmodrak want to build this all out. I only meant that Sean, Mitzi, and I can help. It’s going to be one of the higher priorities after the GPU code’s merged along with cleaning up the model class and improving compile times.

What we don’t need is to force a static god object for handling all output.

Static writers are the way loggers are usually handled.

Are you trying to convince me that it’s standard in c++ to pipe ALL output through a static logger? I don’t think that’s true.

This wouldn’t be a “god object” in the usual sense of being a dumping ground for heterogeneous components—it’s just something to write tags and a few string and numerical types wrapped in a message queue.

But it would be a dumping ground. Got a string? Send it to the logger. Got a std::vector? Send it to the logger! Got a mass matrix? Stringify it and send it to the logger! Got an HMC trajectory? Send it to the logger!! That means it _ will_ accrue responsibility in god object style.

I’ll back up so I can be more specific:

  1. Our output has formatting, round-tripping, and read speed issues that text is not compatible with so I’m not even sure how what you’re proposing handles those basic requirements.

  2. What you’re describing is an approach that can work but why force it on a c++ API that doesn’t need it?

  3. the nugget I see I this proposal that is tempting is to handle routing using text tags. That’s sort of convenient but not really needed since in c++ you could use types for tags and have a static route::name string that the likes of R could rely on.

Bob_Carpenter

    July 14

Also, didn’t mean to imply it wouldn’t be great if @sakrejda and/or @martinmodrak want to build this all out. I only meant that Sean, Mitzi, and I can help. It’s going to be one of the higher priorities after the GPU code’s merged along with cleaning up the model class and improving compile times.

More meta: I wrote much of a detailed spec solving an ongoing problem by coordinating with various team members and nobody had the time to read it before proposing an alternative. That makes it hard to contribute.

I just want to emphasize some of the reasoning for the current set up that I think has been lost in all of these discussions.

All of the algorithms work exclusively on the unconstrained parameter space – they do not know anything about the constrained parameter space, transformed parameters, or generated quantities, nor should they. Consequently each iteration of an algorithm is defined by a state that combines the unconstrained parameters, i.e. the state in parameter space, and any relevant algorithmic parameters, i.e. the state in algorithm space.

The interfaces are the opposite – they work exclusively on the constrained parameter space and its derivations save for dumping diagnostic info into separate files that are never in immediate scope of the interface user. This allows the interfaces to do their analyses without having to know anything about the model object. rstan has creeped away from this abstraction and incorporated more and more information about the model into its interface, but there have only ever been informal discussions about which abstraction is preferred by the interfaces.

The mcmc_writer was designed to bridge these two perspectives, taking into the raw algorithm output and converting it into something immediately usable by the interfaces with the model instance already available. This abstraction ensures that the algorithms don’t have to know anything about what the interfaces expect, they just store their local state and let the writer query and transform/pack that as necessary. The additional benefit of doing this packing on the stan side is that is unifies what information the interfaces receive and hence what they are expected to do, making the UX for each more consistent across environments.

Consequently a proposal like this is not simply a refactoring but rather a complete change to the contract between the stan API and the interfaces. The interfaces would then be responsible for packing everything together, elevating the model instance to something critical in each, and we’d have to decide how unified we’d want that packing to be. While I am partial to keeping the contract as it is I’m not against it, but I do think that we have to have that conversation first and then figure out the best way to implement communication within the terms of that contract.

Finally I don’t think it’s fair to say that there is a huge burden at the moment on the algorithm side. Because of the strict standards we have established in algorithm performance we don’t have any new algorithms on the immediate horizon, and callbacks are the least of the concerns in the ones on the further horizon. If anything we should be cutting algorithms back, for example removing ADVI or moving it into an experimental section as it should have been in the first place.

I actually think this is the most important bit of the proposal - it is actually pretty common in C++ and other languages. Here is how the first google result for “C++ logging library” does exactly that: 2. Creating loggers · gabime/spdlog Wiki · GitHub

  1. Regarding the big picture, I’d ask if @sakrejda 's work might be a welcome, incremental improvement after which we might consider @Bob_Carpenter 's redesign.

  2. Narrowly, regarding @Bob_Carpenter 's sketch, sending data through a logging interface doesn’t bother me. Using a bespoke serialization format does. If there’s some established way to do this sort of thing that other projects have used, let’s use that. One idea would be to use Protobuf’s JSON serialization as the text format used. Each line would be a JSON object which corresponds to a Protobuf message (for which a schema is available).

I think what’s unusual is the dumping tons and tons of data through a static logger. I agree with @sakrejda that this seems unusual.

But that’s a logger, I don’t see non-log output pushed through a static object in most projects. I think the reason I don’t see it is that 1) it forces global state; 2) output tends to be big; and 3) output is produced in limited places in the code so passing in an object to route output is not a burden. I could see using that pattern for routing our logging output specifically, since we don’t want to pass the output routing object into every corner of the algorithms and analysis code.

Thanks for bringing up the ‘mcmc_writer’, I think the ‘relay’ concept we’ve been talking about is basically a generalization of this division to multiple algorithms. It’s not clear to me yet if we want to adjust the divisions but we can cross that bridge when we get there. One problem is that I can’t find detailed doc on what the mcmc_writer is supposed to be responsible for, so if it’s clear to you it would help to have that written down.

Right. Logging output is often handled this way.

General output is often handled this way by web applications. For example, a RESTFUL server behaves exactly this way in terms of dumping everything through a single HTTP connection (or multiple connectiosn for a multi-threaded app with multiple clients). Most software-as-a-service is thus set up this way.

Some of that’s done with inefficient, human-readable, schema-free standards like JSON. Some of it’s done with more efficient, binary output standards like protobuf. Others use custom app-to-app socket protocols. I don’t think it’s a big deal for the scale we’re working at. It’s not like there’s a JSON standard or a protobuf standard for matrices. We’d still have to make up our standard within their standard (like column-major vs. row-major output).

That’s the feature of the design that will enable new output to be added. The alternative is design a new writer, then add it to the service methods and plumb it down to where needed. The results’ the same—we accrue writers, but the process is way more painful.

How so? It’s never been a bottleneck in CmdStan.

I’d say it does need it. But the answer to why do it is simplicity and extensibility of server-side design and client-side usage.

You could and that would be the type-safe, textbook-recommended way to do things. But it’s a huge paintain to extend and maintain on the server side and every extension breaks the client.

I not only read it, I thought hard about it. And the code that Martin sent and the discussions on Discourse. I really appreciate the survey of the current output, which I think was critical in moving forward.

It was the Discourse discussion, the hangout and the issue spec that led me to realize this whole approach is going to be difficult to manage. If we start backing off to generic writers, I don’t see why we don’t just keep going and make it really simple.

That’s a useful perspective. The one violation I’m aware of is the log density, which is reported on the unconstrained scale.

I don’t see why we couldn’t keep this perspective on constrained vs. unconstrained output. It’s just a matter of where things get routed.

What I meant is in terms of controlling I/O.

This all got kicked off when Martin wanted to add trajectory information to output (on the unconstrained scale, I think, but that’s a minor matter). This required changing a bunch of function signatures, pluming arguments through a gazillion functions, and would’ve broken backward compatibility for the interfaces.

Please don’t attribute this just to me. I keep getting in trouble with people for over- or under-attributing things, but Sean is definitely behind a lot of this.

How heavy would it be?

It’s a cheap kind of bespoke, as it has a very limited set of data types. But we could go with something standard or even binary like protobuf.

To address Michael’s concern - we could still choose to have monostream output piped through a series of functions that are allowed to modify the output, potentially taking unconstrainted parameters and replacing them with the constrainted ones, etc., if that was something we wanted to make sure we could do. This is often called “middleware” I think.

I’m not incredibly worried about the line-level data formats, but maybe I’m just not that aware of the full breadth of what might pass through? I’m expecting some parsing to occur anyway since a line can only encode a single row (of probably anything, but especially) e.g. the parameter draws. Are there very complicated objects to pass through the tube?

If there’s a static logger object that, one way or another, has access to all of the different tags and types of output you might want to put out, it could make sense to me to have an official enum or set of methods on the logger that hardcoded what message type you were using - maybe this helps address one of @sakrejda’s concerns and lets us avoid code like getLogger()->info("TRAJECTORY {}, {}, {}", ...); manually and instead could give us something like getLogger()->info->trajectory(data); (or I’d guess getLogger()->trajectory(logger.INFO, data); is actually more like what we’d use here to avoid reimplementing trajectory or info for all of the different combinations of elements from the data type and log level categories, respectively). I think that could be nice because it still avoids the plumbing issues we feel quite painfully with every attempt at passing along new types of data and it has, from the C++ side of things, type safety that people like and can help find typos. It doesn’t actually matter from the coder’s perspective here if that becomes an ASCII string or a protobuf or whatever.

I’m on a bad connection until Wednesday so just a quick note: I think this proposal is in some sense very close to what I had in mind, although it approaches the problem from a different angle. The difference may just boil down to whether you prefer the algs to call

“sendXXX(data1, …, dataN)”

“send(XXX, data1, …, dataN)”

or

“send(data1, , dataN)”

In all cases I think you should have default conversion to string, but this should IMHO happen in a separate step of the process so that interface code may opt to not decode string messages and use the binary data directly.

For example, reconstructing the current cmdStan output from message strings as proposed would IMHO be more work than writing handlers that work with the raw data for each message type separately. OTOH this message format could be all that is needed for the proposed httpstan interface.

Also I am glad that someone keeps kicking the ball so that there is some impetus to move forward.

Dne so, 14. 7. 2018 4:16 uživatel Bob Carpenter mc_stan@discoursemail.com napsal:

Yes it has:

  1. Can’t do exact re-starts with text output without making output files huge (that’s why we haven’t done it).
  2. The entire .csv-reading discussion, most recent part here: Alternative .csv reader, this really limits the utility of generated quantities

On the library side we have a lot of table-writing calls defined by the model file, which makes them shared across all algorithms! That code is written once and mostly shared because it’s all tables of numbers. Then we have a few calls per algorithm that write algorithm-specific parameters with more complex types. That’s not a “huge pain” to extend.

On the interface side it’s not completely clear yet because I need to write what sort of constructor arguments would be used to parameterize the relay. I can see that this is not a straightforward problem.

You never articulated how it’s going to be difficult to manage, that’s the issue we need to address. So I’ll do some of that:

One issue that has come up repeatedly throughout the discussion about routing the messages is that algorithm-specific output is more heterogeneous than the model-file-defined output. Algorithm-specific output sometimes also need to be generated much deeper in the algorithm making it less convenient to pass down writers. A few examples from HMC: mass matrix, momenta, and Martin’s example of trajectories.

For that output the benefits of a static logger do become apparent because we would no longer have to pass the output relay object to some arbitrary depth in the stan-dev/stan code. I don’t think it’s a necessary pattern but I can see how it reduces the maintenance burden.

I still don’t see the point of stringifying everything so I suggest a signature like:

logger.debug(...);
logger.info(...);
logger.warn(...);
logger.error(...);
logger.fatal(...);

Where the ... are handled as parameter pack (it’s a standard simple use of templates so hopefully not a burden). The the default implementation (barring some more specific match that lets the interface handle the data in a binary format) can be:

template<T> logger::warn(T x) {
  ostream_ << "WARN: " << stan::stringify(x)
}

with stan-dev/stan defining how various standard types turn into strings

template<std::vector<double>> stan::stringify(std::vector<double>) {
  stringstream ss; 
  ss << x[0];
  for (int i = 1; i < x.size(); ++i) {
    ss <<  ", " << x[i];
  }
  return ss.str();
}

You get your default text stream and you don’t sabotage other methods of returning output. I didn’t get the text-tags in there but you could assume that the logger always gets a first argument that’s a std::string that can be pulled out as a tag.

For the model-file-defined output (see the beginning of “signatures a relay needs to handle”), we 1) generate it at a limited set of points in the services code; 2) sometimes process it using the model (e.g.-parameter transformations); and 3) have different preferences from the different interfaces for how to output it that are all compatible with some use of std::vector<double> or std::vector<int>

For this model-defined output a static logger doesn’t address the main issues (the coupling with the model, the volume of output that makes stringifying everything awkward).

This is cool, though perhaps more suited to a general-purpose serialization library. Here we don’t actually want to log all data of a specific type the same way - in fact we will mostly want to be telling Stan how to log vectors in different ways depending on whether they are parameter draws, rows of an initialization matrix, initial values, trajectories, etc. So I think the call needs to somehow indicate which data it is that we are logging, perhaps as well as the level (though perhaps we could also just say that any trajectory is of level INFO or something).

So maybe it makes sense to have some logger methods defined on the model class that uses the static logger under the hood. That way the model can decide how to process data before its passed on to the static logger (to untransform parameters or whatever).

I also just want to take two seconds to decouple some of the ideas here: having a single static logger does not imply we use ASCII strings as our serialization format, or even necessarily imply we only use one output stream of any kind. Just want to make sure we’re allowing ourselves to explore the full combinatoric possibilities; you could have a single static logger that outputs a different CSV file for every call to logger().trajectory(logger::INFO, v1); or whatever. Not saying that’s a good combination but that these are orthogonal ideas.

How about keeping the output interface simple and putting the burden on the algorithms to create appropriate output? That would involve writing functions to do transforms we need more than once, but critically wouldn’t embed these in the output API. Putting these transforms directly into the I/O seems like it would introduce more coupling than we want.

Right. But I think it would be unusual from a design perspective to do anything other than route output by tag. Obviously by the time we get to an interface, we want the draws to go in one place, the diagnostics in another, and the online output to go to a console or just be recorded for later inspection. But do we need to build all this logic into the output structure?

I had thought the key to the modularity here was to keep the output writers simple and let the interfaces do the dispatch based on tags.

You’re 100% right that that is the ultimate in modularity and decoupling (as far as I can tell). To reiterate, we’re not just looking to simplify the code here, we want to simplify the organizational concerns - we don’t want to have to do every C++ change in lock-step with every interface we want to support. I really love the way we can easily add new data streams to the output and still be backwards compatible with this monostream tag-based plan.

I guess I was looking for potential compromise ideas - we have pain points now at one extreme, but I’m not totally sure it necessitates moving to the complete other extreme. That’s a real “not sure,” not some kind of British understatement. Complete decoupling is certainly very attractive from the C++ side’s maintenance perspective, and probably my preferred approach at the current point in our discussion. But it’s definitely possible @sakrejda riffs off of these ideas to come up with something even better :)

I htink that’s largely right and I’ve been convinced that the send(XXX, data1, ...) is the most straightforward and easiest to code approach for the interfaces and for our algorithms.

Part of the proposal was to make everything human readable. As @seantalts pointed out, this isn’t critical to having dispatch by string tag and a static logger-type output device.

If we went with binary, we’d want to use something like a protobuf wrapper so we don’t need to worry about platform issues for binaries.

We can round trip with about 20 ASCII digits if we need scientific notation, so that’s a factor of 2.5 larger than double-precision binary.

We decided ages ago not to output 16 significant decimal digits because the latter half are basically random bits given the accumulated errors in our sequences of calculations. The only motivation I see are exact restarts, but then I don’t see the motivation for doing that beyond 6 decimal places of accuracy or so.

If it were free, then sure—I’m not suggesting we try to round if we use binary output!

Smallish integers are smaller in ASCII than in binary.

Side note: For exact restarts that will wind up matching results that weren’t restarted, we need to figure out how to serialize the PRNGs and reinitialize them

There is time overhead. If everything’s binary, then it’s just network/file connections, but if it’s ASCII, then there’s a conversion that will take dozens of operations to do the conversion. it’s this that hasn’t ever been a bottleneck in CmdStan other than in reading huge data files that were fit with a very simple linear regression in optimization. In that case, CmdStan’s I/O takes up well over 90% of the total compute time.

This is just CmdStan, right? The other interfaces control their own ouptut as of version 2.17. We could change that. We could write in binary, or we could write to some format other than .csv.

I believe Allen and Ari’s current plan for PyStan is to use JSON. Is that the kind of thing you’re thinking or are you thinking something binary?

Meta-commentary

I’m going to try to stay away from subjective terms like “huge” form now on, as they’re just getting in the way of the discussion:

My use was subjective and @sakrejda’s was a factor of two. I just don’t want to devolve into semantics here.

Can’t we just write functions for shared operations?

In modern C++, it’s much more common to write standalone functions rarther than adding methods to a class. That’s partly because it’s more flexible for compilation and for typing. It’s not set in stone, so if it makes more sense as methods on a class, we can do that.

Sure. The current design (as of 2.17) requires changing all the signatures and breaks backward compatibility with every new output decision. I now wonder if we all even agree that the current situation is problematic.

The more recent designs from @martinmodrak and @sakrejda, in my understanding, were attempts to solve the problem of having to plumb in new writers and then plumb them down to where they’re used every time we needed new output. Like our original design, they were focused on type-safe, algorithm-aware, structured output with C++ handlers on the interface side.

@seantalts has been gradually convincing me that the focus on type safety is largely misguided in these contexts, as the interfaces want more flexiblity in what they do with output without having to write new C++ handlers.

All else being equal, human readable output is preferable because then someone at a later date can just grab it and process it however they want.

It also makes it easy for the interfaces to procecess without getting third-party libraries like protobuf involved.

That’s the kind of thing I was imagining. With something like methods on the C++ side for all the differnet types of output. With a static logger controlled by tags and default conversions, this could be just

LOG_("WARN", x);

but as @betanalpha pointed out, sometimes that output needs to get converted back to constrained format, as in:

LOG_("DRAW", model_.constrain_(theta));

This is the same issue that @betanalpha brought up. The way we handle this doesn’t need to change. Assuming we plumb an mcmc_writer down to the algorithms, then we can just use a new form of mcmc_writer as in:

LOG_("DRAW", mcmc_writer.foo(theta));

You can construct a std::vector out of any iterator over values, so that’s no problem. There’s no more direct way to serialize to file that doesn’t involve writing a custom allocator for std::vector.