Okay, we ended up having that meeting. To attempt to summarize some of our discussion:
- I personally want to see stanc3’s internal representation of distributions and bijectors become a little more formalized and structured; that has its own benefits and might shed light on what working with it would be like in Stan
- People had no immediate objections to thinking about a distribution as a function that returns a record containing the relevant functions (rng, lpdf, cdf, ccdf). example:
normal(mu, sigma).lpdf
orlpdf(normal(mu, sigma))
. The first version assumes we end up with some record access syntax like that in Stan; the 2nd could be done with tuples.
I want to take this a step further and think about what user defined gradients and user defined transformations/bijectors would look like as well, since we’d like to re-use patterns as much as possible. Transformations can follow pretty seamlessly the distribution path; to define a transformation, define a function that returns a record with forward
, inverse
, and log_det_jacobian
(or some similar name) for the jacobian adjustment. This requires closures, hopefully soon to come.
User defined gradients COULD follow a similar path, giving a function that returns a record containing a value
and derivative
functions. @bbbales2 what are your thoughts about this? Then, to construct a user-defined distribution that has custom gradients, you do something like this:
def normal(real mu, real sigma) {
auto value = (real y) { 1/sqrt(2*pi... } ;
auto derivative = (real y) { { (y - mu) / sigma, (mu - y) / sigma, ... } )
return {lpdf= {value= value, derivative=derivative}};
I think this is kinda bloated syntactically. We’re in a DSL, after all. And we’ve already made a decision to use string suffices to group things before - namely _lpdf
, _rng
, _cdf
. We could pull that back in Stan 3, or we could lean in, adding something like _grad
for user-defined gradients and _forward
, _inverse
, and _log_det_jacobian
for transformations. And then we can represent this stuff internally in the MIR as nice nested records or something.
I’m going to post a separate thread about syntax for user defined gradients and user defined transformations…