Analysis API

Thanks.

I’m not exactly sure what you mean by “algorithm API,” but it doesn’t quite matter because both the algorithm code in C++ and the services (C++ API for the interfaces) both throw exceptions.

For example, see src/stan/services/sample/hmc_nuts_dense_e_adapt.hpp line 85.

We could handle that a different way and catch all exceptions, but right now, we handle std::domain_error within the methods and allow any other exceptions to propagate.

All good points, but I was thinking about messages in conjunction with exceptions. Of course, I’m assuming that writers can either flush immediately or buffer and flush appropriately even when handling exceptions.

For warnings, I was thinking that there might be non-exceptional cases that don’t warrant failure of computing a value, but should indicate some warning. Of course, we could just say it either works or doesn’t and the only way to communicate output from one of these functions is through the message in the exception.

Just need some clarification… what are you calling the “algorithm API”?

That doesn’t sound like it should be an exception. But it also doesn’t seem like the return should be Error when it’s not an error.

For example, an exception should be thrown if the chains aren’t all the same length passed to R-hat (until we generalize, that is). We shouldn’t throw an exception if R-hat is above some nominal operating threshold like 1.1.

1 Like

I think this would be o.k. in a writer or logger if we add typed messages. Otherwise an interface (e.g.- ShinyStan) will need to parse text messages in order to decide if it wants to show a different sequence of plots when models don’t converge. No reason typed messages can’t decay to text but the inverse doesn’t work.

These functions need to return indicators as to whether or not the diagnostic has passed or not so that users can programmatically check if the diagnostics pass instead of having to parse through the returned messages. Okay, Warning, and Error return codes refer to the diagnostic output, not anything to do with internal operation. Exceptions are fine for the execution of the diagnostic itself failing.

Sounds like we’re fine with the signature for simple functions like
ess. That’s progress.

As for the more complex diagnostic function, I see @betanalpha’s point.
This is tricky. Clearly we need to return different things depending on
what happens. And, equally clearly, there are multiple ways of doing this.

I’d like to add to the list of options returning a single string, which
is a JSON-encoded mapping of the relevant information. Since all
conceivable interfaces support parsing JSON and Boost supports writing
it, this could be a simple solution. It would, in my opinion, be simpler
than having multiple writers.

We already have a reasonable client/API relationship for passing messages in the algorithm API and I don’t see any reason to change that here. Let’s maintain a relatively uniform interface here, and if there’s interest in changing it then we can have a separate discussion to change everything uniformly.

Programmatic status codes sound good.

I’d like to try to round up the scope of what we have to support current functionality in a client-facing API. To summarize, I’m totally with Michael on reorganizing this into an interface client API.

Where I’m not sure there’s agreement is with

  • whether this is an I/O based interface (protobuf, etc.) or whether it’s a C++ API or whether it’s the former layered on top of the latter,

  • how much of the functionality should go in the math lib, and

  • who will manage it (related to where it goes, I suppose).

From the current client perspective, I think a bunch of functions do depend on chain structure:

  • R-hat
  • n_eff
  • mcmc std error

Then there are some functions that can take consolidated output across chains:

  • posterior quantiles
  • posterior means
  • posterior std. dev.

Things that I’m unclear about whether they need chain identities and what the output looks like are:

  • BFMI (???)
  • divergences (binary flag per draw)

That was just for HMC, of course.

+ treedepth exceedences

How about optimizing and ADVI? They return set of independent draws from the approximative parametric distribution. When combined with importance sampling there would additional importance weights. Relevant diagnostic functions

  • k-hat
  • n_eff
  • IS std error

and some summary functions as for mcmc

  • posterior quantiles
  • posterior means
  • posterior std. dev.

Thanks for the summary, @Bob_Carpenter. And thanks for adding to the list, @avehtari. Some of my responses.

For these methods, it should just be a C++ API. Earlier

in the thread, I believe we agreed that the input would be:

std::vector<double*> draws, std::vector<size_t> sizes

(the vector of sizes for generality; I think we could also have a convenience method that accepts std::vector<double*>, size_t.)

I think for the most part, the returns are straightforward, so we should use return values as the result and exceptions as exceptions to computing the function at hand.

For example, I think the R-hat function signature should look like (name can change, of course):

double split_r_hat(std::vector<double*> draws, std::vector<size_t> sizes);

There is an alternative return here, which could be something like std::pair<double, std::vector<double>> if we wanted to return the overall R-hat and then also one per chain, but I’m looking at the list and most of the things will either return a single value or one per chain, which is easy to return.

I believe we already have buy-in for these input types from CmdStan (me), RStan (@bgoodri), and PyStan (@ariddell).

I’d suggest these live in the stan-dev/stan repo. They’re not basic math functions. They’re functions that operate on the output of the algorithms, which are in that repo.

Once place they could go is under src/stan/services/analysis/ or something else under src/stan/services/.

I think functions like autocorrelation() might make sense to live inside the math library.

Hopefully by simplifying the function signatures, it won’t need to be managed much. A lot of it hasn’t needed to change over the years.

I think the arguments can be the same. For convenience, perhaps we offer an additional function signature (the second one):

std:vector<double> quantiles(std::vector<double*> draws, std::vector<size_t> size, std::vector<double> probs);
std:vector<double> quantiles(double* draws, size_t size, std::vector<double> probs);

Thoughts? I think the steps are to build one, integrate it all the way through, then build out the rest.

I believe that everyone agreed on a basic std::vector based API which will make it straightforward to build new routes.

The first routes will be for MCMC/HMC and mirror the functionality in https://github.com/betanalpha/knitr_case_studies/blob/master/fitting_the_cauchy/stan_utility.R. Right now we’re still iterating on the k_hat diagnostic, hence the lack of movement on the API itself.

The code should stay in the stan repo, unless there are modular bits that would be useful to be expose to the Stan language and I propose that this functionality fall under the algorithms management.

Once we have the basic MCMC routes up then we can consider routes for other algorithms as @avehtari suggested. If there is interest in establishing an example sooner rather than later then I can work with volunteers to get some of the more established routes up (neff, rhat, divergences, etc).

There are two k_hat diagnostics:

  • one for Monte Carlo error for each parameter (MC SE doesn’t make sense if true variance might be infinite)
  • one for joint posterior distributional approximations like ADVI and Laplace (optimizing)

Thanks for the summaries, @betanalpha and @syclik. I think this means we’re in agreement on the big picture and just need to flesh out details.

I agree that starting with HMC makes sense in terms of exact design, but it might still help to take an inventory of what’s going to be needed for all the algorithms so we make sure the basic infrastructure has the right coverage.

As for things we’re still working out the kinks of, I’d rather not put those in Stan. I want the standard for going into Stan to be something that we’ll support going forward for at least five or six years. This stops us from pulling the rug out from under users, who very much resent changes in releases.

Now, if there’s something we’re trying to estimate, there’s no reason we can’t put in our best shot at an estimator now and then try to make it better. That’s essentially the path we’re taking with n_eff, and that’s a good thing. So the key thing is to settle on the interfaces, not the precise implementations or estimators.

If it’s OK with you all, I’m interested in working on this.

In my mind, a reasonable next step is to file an issue in the stan repository, which includes an outline of a proposed directory structure and some initial HMC function signatures – essentially summarizing this thread’s main points. We can focus on building out the rest if this first issue (and subsequent PR) works out well for all.

Please let me know if you see a preferred next step.

@roualdes, can you hold off until next week to talk about design? I
have a preliminary design in mind but I’ve been traveling with limited

access to internet. Please feel free to direct message or email me

directly. Thanks.

Sure, not a problem. I’ll email you if I don’t hear from you in a week or so.

The first question is where do the diagnostic routes go. I propose that we put them in stan-dev/stan:src/stan/diagnose/* where * then mirrors the algorithm directory structure. For example, we’d have

stan-dev/stan:src/stan/diagnose/mcmc/compute_ess.hpp
stan-dev/stan:src/stan/diagnose/mcmc/hmc/compute_divergences.hpp
stan-dev/stan:src/stan/diagnose/advi/compute_k_hat.hpp

Each diagnostic comes in two variants: a compute variant that computes the diagnostic and a check variant that compares the computed value to set thresholds. The compute variant returns the computed value,

return_type compute_diagnostic(...);

while the check variant returns an boolean specifying “Warning” or “No Warning” results as well as passing a string message to a given writer,

bool check_diagnostic(..., writer& writer);

The check variants can take in additional values for thresholds, for example the exact E-FMI threshold used or the current max_treedepth.

Both functions throw exceptions for bad inputs or if the calculation of the diagnostic fails.

Here ... refers to the necessary input information which will vary based on the information required by the diagnostic. For example, the effective sample size estimator discriminates between the chains and so we would need

... = std::vector<std::vector<double>>& states

while to check diagnostics we would just need one monolithic vector of 0/1s

... = std::vector<std::vector<double>>& divergent

The clients are responsible for packing available information into these shapes, but are then free to store the fit information however they deem appropriate. In particular, the interfaces are responsible for packing only a limited number of parameters into these inputs if the user does not want to run diagnostics over everything.

We can start with the effective sample size and split \hat{R} as an example and then we can build other diagnostic routes for MCMC/HMC and other algorithms from that design. In particular I ultimately want to move all of the functionality currently in the chains object from the mcmc code to these diagnostics and then leave the mcmc code using simple data types throughout.

I should also mention that some check functions will also require parameter names in order to tag which parameters failed the diagnostics. Putting this all together, the signature of the function checking effective sample sizes would be

bool check_incr_ess(std::vector<std::vector<double>>& states
               std::vector<std::string>& names,
               double incr_ess_threshold,
               writer& writer);

Also, given that the two variants both analyze the inputs (compute_ess, compute_split_rhat) and quantify those inputs (check_incr_ess, check_split_rhat) we should probably call the base folder analyze instead of diagnose to better denote the comprehensive purpose.

Thanks for writing that up. The locations proposed, stan/diagnose or stan/analyze, in the source tree looks ok to me.

Based on earlier conversation and buy-in from RStan and PyStan devs, I would suggest modifying the input to std::vector<double*> (or size_t* if looking at something like treedepth) with an accompanying std::vector<size_t> for sizes.

I don’t think we need to include a vector of names for everything that can be done on a parameter-by-parameter basis. I don’t mind having a convenience function to wrap the check functions across multiple parameters, but I think we can do without the parameter names for this API.

Can you summarize this point? Would be good if the output refactor took this into account.