Logging in Stan services

I agree it’s likely not a bottleneck and we should do the simplest thing first, which is why I never said anything about this on the PR.

Can you be a little more specific about where these are coming from? I didn’t think there were per-iteration logger calls unless there’s an exception. I don’t care about the calls that are triggered by exceptions b/c spewing huge numbers of messages usually indicates something is wrong with the model…

Whoops. Didn’t type that correctly. Per iteration when there’s an exception.

And I think we’re on the same page… I think the templated thing will come in handy for writers.

1 Like

The simplest interface to clients is not the simplest interface to program. It’s much simpler in logger client code to write

logger.info("a=", a);

instead of

std::stringstream ss;
ss << "a=" << a;
logger.info(ss.str());

Many people might want to write a logger implementation and it’s much
easier to do that if you don’t have to deal with parameter packs (even if I
like the template solution personally). Simpler for them matters too.

Who do you think is going to be writing logger implementations besides PyStan and RStan and CmdStan?

Anybody who wants to write an interface. I guess it’s the same people who have to deal with calling the thing, more or less.

What I’m asking are we designing this for ourselves (PyStan, RStan and CmdStan), or do you anticipate other people wanting to write interfaces using this kind of thing? In the former case, we don’t have to speculate, we can just ask ourselves!

My wish list for an interface is something like:

  • these loggers

  • sane input format, binary and text

  • typed output for the stuff like mass matrix / warmup / config messages

  • streamed binary output that can be read during the run

  • utilities to read/validate config

  • config you can read from a file

  • threading

  • job scheduling based on config files

I could do most of this in an interface by going off on my own but I think
it’s better to push core Stan in a direction that makes it possible

K

I thought the callback writers + logging also (implicitly) define an interface which contributors of new algorithms (e.g. advi) will have to use if they want *Stan to be able to use the algorithms.

That’s too restrictive. We’ve already seen issues with ADVI having to hack the output by:

  1. adding a dummy value of 0 for lp__
  2. putting the variational solution as the first “draw”
    3.having the rest of the draws represent the posterior.

We should have inference algorithms some flexibility in what they produce. There are even differences in static HMC vs NUTS in the output and differences between unit, diagram, and dense choice of metric. We do similar things once we have the output, but we should be able to do some different things. Like report Rhat for draws, report max for tree depth, and counts for divergences.

So I don’t think we should be forcing everything to go through the same writers. We should be defining the output for each inference algorithm. There will be a lot of similar output, but not everything will need to adhere to the exact same output.

Hopefully that makes sense. More discussion is welcome (maybe on the other thread discussing CSV writers?).