Universal, static, logger-style output



It’s not that I’m personally offended, I just want these conversations to be based on references to actual code rather than our feelings about what we remember (that’s why I keep checking the actual code). I don’t think predicting future needs is necessary, but a good design is.

From my point of view I think we can punt on this question and talk about how to configure routing first. Either way an object gets constructed to handle routing.

I do think at some point we will have to decide about the trade-off. You want to make the logger static and global and you’re willing to pay the complexity cost of writing a high-performance multi-producer single-consumer queue. I want to make it a local object and pay the cost of passing a different one to every chain via the service layers. At that point if you can prototype something that will process 200k parameters from each of a dozen chains I guess it’s a fine design. I guess it could be thread_local and then you would be back to single-producer.


I have low confidence, but I don’t think we have any current multiprocessing stuff (or immediate plans) that require anything complicated here: chains are separate processes executing the same program but sharing nothing; map_rect threaded AD is all within a likelihood; MPI also has separate processes that don’t share memory; GPU is a single process… Am I forgetting anything? It’s Friday afternoon so please excuse (but point out) any omissions!


Using threading to run multiple chains via service methods has come up before, so it would be a choice to cut that possibility out. A fine choice I guess if y’all want to make it but still a choice.


Fair enough - my instincts in this kind of situation are to design to our current system and avoid too much future-proofing. Here, I think using the very simple static logger version is not a lot of effort wasted if we need to then code (or import code for) a high-performance multi-producer single-consumer queue when multithreaded chains end up going in. The writing API and the interface-side reading API should be the same both ways, so it should be easy to change just the internals to deal with threads efficiently.


I think I’m burned out on this topic so I’m going to stay out of it for a while.


I also kind of lost view of what is a possible next step that I could help with.


Right! I’m really excited about this project; I think it will be really innovative in the space and very helpful in our quest to make Stan more modular and developer-friendly. Eventually I hope people can subtree the Stan repo in and easily write their own inference algorithms on top of what we’ve built without requiring a ton of collaboration with us, and I think this reduction in complexity and increase in flexibility is a key component.

@martinmodrak, here is a more concrete proposal that I hope effectively summarizes the discussion above. I thought about it a little more and I think @sakrejda was right and this needs to be thread-safe right away. We already allow threaded logging in, for example, multi-threaded map_rect on a UDF with a print call. Not sure how well this works right now - I’d imagine it has race conditions and can lead to garbled messages.

Please comment on the proposal! I’m hoping to get comments on it and then get the finished design approved by Bob. I’m hoping @martinmodrak (or someone else!) wants to flex their C++ muscles and take a crack at implementing the end result.


@seantalts Great that things move forward. I am currently in a bit of a time pressure before leaving for vacation (and then to StanCon!), so will be able to put time in this only after StanCon.


C++11 apparently got rid of the potential for race conditions allowed by the C++03 spec for std::cout. See: https://stackoverflow.com/questions/6374264/is-cout-synchronized-thread-safe

How did anyone ever get anything done before StackOverflow?

Concurrent access to a synchronized (§ standard iostream object’s formatted and unformatted input (§ and output (§ functions or a standard C stream by multiple threads shall not result in a data race (§1.10). [ Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note ]

No guarantees on message coherence!

Just interleave some spec-like numbers (§ into a sentence with “shall” and you can convince a developer of anything. (There’s probalby an xkcd on this.)


Overall, I think this is the functional direction we need to go. I specifically like that this will simplify C++ clients. I also think the tehcnical proposal to use something like spdlog makes sense. More on both below.

The scope of the functional spec needs to be made clearer. I’m pretty sure the intent is to

  • replace the existing writer interfaces in the services methods with a static logger type output

That just needs to front and center and the implications spelled out.

These are currently used for output of configuration, output of values (like draws, iterations, etc.), and for messages from the interface (refresh, errors and warnings, etc.) Given that there was previous discussion on making these separate, I think it needs to be made clear this proposal will combine everything into one type of output. Presumably the interrupts will be handled separately from all of this.

The question is then how easy it is to route back out again. The draws need to go into memory, say, and the console messages to the console. This is already implicit in the code in the technical proposal that sets up a console logger and data logger, but an example of a message would help here.

Given what we need to implement, it makes sense to use a third party library. Obviously, we’d like to know how reliable that package is. Are other people using it and is it being actively maintained? Is the source easy enough to follow that we could potentially fix it? Do they have a responsive mailing list?

Is there a way to control precision of floating-point output?

My only concern is how fast it’s going to be. I can imagine dispatch on tags and thread safety and converting to ASCII and time stamping getting expensive, but then I can also totally imagine that’s all dominated by ASCII to floating point.

A final technical design’s going to have to make sure all of our current output messages can be handled. Krzysztof laid down a survey somewhere that should be very useful in this. And it’s not just me. We have to make sure this is usable from the interface point of view, so I’m specifically going to want to hear from:


I’m looking at it now and it seems decently well maintained (from traffic on issues and PR’s). I’ll try to catalogue some of the things we should track about it. Two things I see now:

The lead maintainer leaves people hanging on moderately badly phrased questions but it’s more or less what you’d expect for an open source c++ lib with one maintainer. The usage examples are good.

Looks like good support for user-defined types (you get direct acess to the stream so can do stream.write() calls for binary output.

I see performance is addressed in terms of messages per second but not size per message so that’s worth checking before treating this as a viable library to use.


We may still need to pass loggers around: https://github.com/gabime/spdlog/wiki/2.-Creating-loggers#accessing-loggers-using-spdlogget


I guess I was writing this within the context of this thread, rather than as something for posterity. We should fix that. Do you want to suggest edits to it or should I go in and add a sentence or two to the top explaining this and linking to @sakrejda’s posts inventorying the current writers?

Which part are you referring to with that? I think only the plug-in part of the proposal involves dispatch on tags, right? If people take a look at this and think we don’t need the plug-in part we can just not worry about dispatch on tags. If we end up needing it, I hope we can come up with a smarter tag collection that could help resolve those plugin calls early, maybe even at compile time.

PS Google docs have a great workflow for commenting on sections of the documents :P But if you give it a shot and hate it we can switch to PRs after this design review.

I agree with @sakrejda’s assessment above. Re: the point around passing loggers around: I hope we can store logger pointers in relatively few locations - calling spdlog::get per iteration might end up being infrequent enough for performance to be relatively unimpacted. Also, interfaces could use the single-threaded versions of loggers if threading won’t be enabled (or we can provide wrappers that do this automatically), and we can use the same trick we use in the current threading stuff for the AD stack to get cheap and unsafe global static singleton without threading but the safe singleton if threading is enabled.

What kind of benchmarks can we run now before coding begins that might help us assess performance concerns now? Do we want to compare how long one of the realistic stat_comp_benchmark models log_prob takes vs. both the threaded and unthreaded writing of the equivalent of 4000 draws?


Per iteration is definitely no big deal with HMC since we push most computation within-iteration.


I went and added some further context to the beginning of the doc but would welcome further feedback! Putting on my “new developer of the far future” hat it might be worth even adding a new section explaining all the problems we have encountered with the existing writers, other designs we considered, and highlighting how we think this design will specifically solve these problems. And then people now with that context can skim or skip that section, haha.


Not for posterity—just self contained in terms of what it’s going to do. .

The thread mentions lots of possible options. What we need is a concrete plan of what’s being proposed. It doesn’t need to be super fine-grain. But it should have decisions on things like whether we’re going with ASCII or binary output.

Like I said, not for posterity and not for dev of the future. It’s for making sure we all understand what is being proposed. So I don’t think any of the history, etc., is relevant. That may be relevant for pitching it to people, but I don’t think it should be part of the spec itself. Functionality, yes—history, no.

I thought we were going to dispatch on tags in the sense of having some things go to one file and some go to the other. I only worry that if it’s done with something like a hash, it’s going to add extra overhead. The tables will be small, though.

You can time writing to file and stdout for CmdStan.

I’m not sure what the relevant comparison would be for RStan.


The proposal right now just has one file for data and one for messages and the developer chooses how to dispatch.

Yes, I think the proposal linked above should be something close to that level of granularity. I already have code samples in there and have a really rough proof of concept working locally (not in the Stan code base, just emitting Eigen types and testing to make sure we’d be able to do everything we need to do).


Again I think draws is less important than number of parameters. A few hundred thousand parameters would be a nice test (via CmdStan is fine, I don’t think this needs to be interface-dependent.


I’m hoping for some benchmarks we can run before anyone starts implementing the proposal with the goal of verifying that performance shouldn’t end up being a consideration. So that’d mean we wouldn’t have it hooked up to CmdStan or anything yet. So let’s say writing out an Eigen vector of 500,000 doubles compared maybe with calling the relevant CmdStan methods with the same number of parameters? Something like that?


yeah, agreed, no need to hook it up to CmdStan.