Analysis API


#5

No, I am suggesting that we define an I/O convention for how the interfaces have to wrap the samples and diagnostic info from a fit object into std::vectors or POD arrays to be analyzed by routes and then what they expect in return (probably strings and error codes if not numerical results like ESS and rhat).
Bonus points if the convention minimized excess copying on the C++ side.

None of this would be exposed to users – the interfaces would call these routes internally.

I am hesitant to put this stuff into math for the same reason I don’t think the Welford stuff really belongs there – the math library is essentially an autodiff library, not a general purpose math library. This stuff would be just as well in a separate auxiliary math library.


#6

That’s a nice way to think about it. Effective sample size estimators and R-hat are more like the sampling algorithms—subject to change as we learn more. So it makes sense to leave those with the algorithms. Plus, I think it makes sense to have you manage them.

If we ever get around to adding derivatives to FFT, that’ll go in the math lib.

Welford’s algorithm is one of those oddball things where it’s at an API level of an accumulator rather than being like our ordinary math functions. So I’m OK with that going either way. I could see some of the algorithms that do need to be in the math library, like calculating sample variance, needing it, though. If that happens, then whatever they use in common should be in the math lib.

OK, that’s what we’re already doing. See this pull request for n_eff.

I’ll reverse link that back to here, too.


#7

Yes, this is what incited my post. We shouldn’t do this piecemeal but put something systematic together on which we can add all of the other diagnostic calculations.


#8

Do you think we won’t be able to use vector<double*> everywhere and wrap in Map<Matrix<...>> where necessary for Eigen?


#9

But how would we get the variable/diagnostic x draw x chain shape into a single vector? If we used vectors of vectors then we couldn’t guarantee memory locality for map. We’re probably have to do a copy at least once to put everything together, but let’s figure out the signature that will make it easiest for the interfaces to pack everything together and for the C++ to unpack into Eigen objects.


#10

Chains are stored in separate holders in Python and R. And the variable/diagnostic \times draw is stored using memory-contiguous column-major-ordered holders (e.g., numpy array in Python) by the interfaces so those can be referenced with a double*.


#11

Is there a place where we operate on a Chain x Draw object any other way than as a vector for each chain? I don’t see a need for Variable x Chain x Draw.

If you have vector<double*>, it’s easy to map each entry to an Eigen vector or row vector. So if the answer to the above is that all the underlying operations use Eigen vectors, this is the most economical way to pass things that doesn’t implicate Eigen on the interface side, which is what Allen was requesting.


#12

For the ESS calculations as their are implemented we’d need matrices with shape draw x integration – so can the interfaces guarantee that the info from each chain points to contiguous memory? If so then we’d be looking at a signature like

error_code_type diagnostic_function(std::vector<double*>&, 
                                    std::vector<std::string>&, 
                                    int n_var, int n_draw, writer& writer);

#13

I think we could, or at least request and assume that it does.

What’s the motivation for the std::string, n_var, and writer? Can it not just return a value for most diagnostics?


#14

For example, I think ess could have the signature:

double ess(std::vector<const double*>& x, int size);

And I see why the writers would be useful, so maybe:

double ess(std::vector<const double*>& x, int size, writer& error, writer& warning, writer& log);

For this to really work, the interfaces need to store each variable within a chain in a contiguous block of memory. I think we could require that.


#15

I’m not sure we need the writers for ess. They add a lot of complexity.

Could we use exceptions instead? Or return a std::pair with the first
value being the result and the second value being a string containing
the error message (or empty if no error). (This is a pattern Go uses.)


#16

Yes, was already thinking that exceptions could handle exceptional events. I don’t think ess has any output other than the return (we don’t need any writers), but other diagnostic functions might?


#17

Nothing that couldn’t be returned and then passed to a specialized writer method.


#18

Yes, the double* for each chain would be contiguous, but they wouldn’t be contiguous with each other.

I’d much prefer exceptions if that’s workable from the interface perspectives.

If there’s an error, the value won’t make sense. Returning both the value and error message is then clunky as only one will ever get used. This error code return is also the tradition in C with functions with signatures of the form int foo(..., result_t* result) where the return is a return code indicating error or not and the function setting result with the actual value calculated.


#19

Exceptions are well-supported in Python (Cython) and R (Rcpp). The Stan
library already uses them in a few places.


#20

Unfortunately the conversation has diverged a bit here. Let me try to bring it back a bit.

  1. We do not throw exceptions to the interfaces for the algorithm API. This thread is not about redoing the API but adding additional routes for common diagnostic calculations. If there is interest in changing the client relationship and how error information is propagated then a new thread should be started.

  2. Diagnostic functions return various statuses depending on the thresholds we set (and possibly modifiable by the user). Okay, Warning, and Error messages do not indicate that the function has failed but rather that the input fit information is good or suspect. Exceptions would be more appropriate for failures to calculate the diagnostics.

  3. The question introduced in this thread is designing a uniform interface for all diagnostic functions that is sufficiently robust for current (and immediate future diagnostics) that is compatible with the algorithm API.


#21

Thanks.

I’m not exactly sure what you mean by “algorithm API,” but it doesn’t quite matter because both the algorithm code in C++ and the services (C++ API for the interfaces) both throw exceptions.

For example, see src/stan/services/sample/hmc_nuts_dense_e_adapt.hpp line 85.

We could handle that a different way and catch all exceptions, but right now, we handle std::domain_error within the methods and allow any other exceptions to propagate.

All good points, but I was thinking about messages in conjunction with exceptions. Of course, I’m assuming that writers can either flush immediately or buffer and flush appropriately even when handling exceptions.

For warnings, I was thinking that there might be non-exceptional cases that don’t warrant failure of computing a value, but should indicate some warning. Of course, we could just say it either works or doesn’t and the only way to communicate output from one of these functions is through the message in the exception.

Just need some clarification… what are you calling the “algorithm API”?


#22

That doesn’t sound like it should be an exception. But it also doesn’t seem like the return should be Error when it’s not an error.

For example, an exception should be thrown if the chains aren’t all the same length passed to R-hat (until we generalize, that is). We shouldn’t throw an exception if R-hat is above some nominal operating threshold like 1.1.


#23

I think this would be o.k. in a writer or logger if we add typed messages. Otherwise an interface (e.g.- ShinyStan) will need to parse text messages in order to decide if it wants to show a different sequence of plots when models don’t converge. No reason typed messages can’t decay to text but the inverse doesn’t work.


#24

These functions need to return indicators as to whether or not the diagnostic has passed or not so that users can programmatically check if the diagnostics pass instead of having to parse through the returned messages. Okay, Warning, and Error return codes refer to the diagnostic output, not anything to do with internal operation. Exceptions are fine for the execution of the diagnostic itself failing.