Feature request: Jacobians of unconstraining transform

For those cases, input the unconstrained thing as an argument to the function in generated quantities, to_var() it internally, and transform.

Yes, but how do I access the unconstrained variables or unconstraining functions in a Stan script? As far as I can tell, those are only available as methods of writer objects. Sorry if weā€™re talking past each other.

If you declare things with constraints in the parameters block, then there is currently now way to access the unconstrained parameters from within a Stan program. But you can just declare real, vector, etc. without constraints in the parameters block and then transform them yourself into the constrained space. The only thing is that you have to deal the Jacobians manually.

The current density function looks like this:

template <bool propto, bool use_jacobian>
T log_prob(vector<T>& params_unc);

Data an transforme data are stored as member variables in
the class. We didnā€™t get const correctness right the first
time aroundā€”that should be const in both ways, but isnā€™t.

We could add

template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
                        T& jacobian,
                        T& log_density_constrained) const;

and one that doesnā€™t bother with the Jacobian (donā€™t know
how to name the second argument here ā€” itā€™s the constrained
density value without the Jacobian, given the unconstrained
parameters in):

template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
                        T& lpdf_constrained) const;

Iā€™m less clear on how to present the constrained log density and make it
easy to use with gradients because we can only read structured data like
this with a var_context:

template <bool propto>
void lpdf_constrained(const var_context& params_con,
                      double& lpdf_constrained,
                      vector<double>& grad_lpdf); 

and we could do one without the gradient, too.

  • Bob

@bgoodri, that would be fine. But

ā€œtransform them yourself into the constrained spaceā€

How does one do that in a Stan model? Of course, I can take the Cholesky decomposition myself, &c, but it would be important that I be using the exact same constraining function that Stan uses, since I would be using the result to transform Stanā€™s unconstrained derivatives.

Iā€™m really sorry if Iā€™m being slow here.

@Bob_Carpenter, that is a good idea. But I really do want to emphasize that my request is the ability to get Jacobians out of Stan models, and to do so using the underlying constraining / unconstraining functions. The ability to get constrained derivatives of the log probability is just an example use case.

For constraining functions that are not one-to-one (e.g. simplexes), the unconstrained derivatives might not even be well-defined. And as you pointed out, the Jacobians might be huge, and the user might not actually need to transform every single derivative in the gradient. For these reasons it might be better to let the user do it themselves.

Theyā€™re all documented in a chapter of the manual near the end.
But the Cholesky ones in particular are tedious and error prone
to implement.

  • Bob

Maybe if you could propose a function we could add?
Iā€™m still not sure what you want.

The simplex transform is one to one. With N constrained
dimensions there are only N-1 unconstrained dimensions.

  • Bob

Right, I meant itā€™s not 1:1 between R^N and R^{N-1}. The constrained gradient would live in R^N. But weā€™re on the same page.

Yes, I can implement the unconstraining functions myself. I mean, Iā€™m using the C++ libraries directly myself to get Jacobians. The point of this request is to make it easy (and foolproof) to use the full suite of Stanā€™s autodiff tools using on the high-level language.

From my post above, hereā€™s what I think would be the ideal solution:

ā€œHereā€™s a solution that could work. Suppose there were an option to calculate the Jacobian of parameters defined in the generated quantities block. The columns of the Jacobian matrix could be referenced with param_names, and you could define something like generated_param_names to reference the rows, just like you do with unconstrained_param_names. Then the user could make a separate stan program and copy the transormations they need into the generated quantities block, omit the rest of the model, and caculate the Jacobians they need.ā€

ā€œIn fact, if you exposed the functions that are used to convert an unconstrained parameter to a constrained parameter (maybe they already are exposed?) then this solution would also solve my original, more narrowly scoped feature request.ā€

Right ā€” itā€™s only one-one between N-simplexes and R^{N-1}.

Would you mind trying to convert the prose into function(s)
declarations youā€™d like to see implemented?

  • Bob

Sure, hereā€™s some pseudo-stan-code.

Example one
Jacobians of the unconstraining functions:

// transform_model.stan:
data {
    int <lower=0> K;
}
parameters {
    vector[K] mu_mean;
    matrix[K, K] mu_info;
}
model {
}
generated parameters {
    // I'm not sure how best to get K_free := K * (K + 1) / 2 here.
    Vector[K_free] mu_info_free;
    Vector[K] mu_mean_free;
    mu_info_free = unconstrain_cov_marix(mu_info);
    mu_mean_free = unconstrain_vector(mu_mean, lower=0);
}

Then in R:

# Right now I get a stan fit object with stan_model() then sampling(iter=1),
# but whatever would work.
data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")

transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_2", "mu_info_1_1", "mu_info_2_1", "mu_info_1_2", "mu_info_2_2")

transform_model$generated_param_names()
# Result something like:
# c("mu_info_free_1", "mu_info_free_2", "mu_info_free_3", "mu_mean_free_1","mu_mean_free_2")

jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_info_free_1 / d mu_mean_1 = 0
# jac[1, 2] = d mu_info_free_1 / d mu_mean_2 = 0
# jac[1, 3] = d mu_info_free_1 / d mu_info_1_1 = something non-zero
# ...
# jac[4, 1] = d mu_mean_free_1 / d mu_mean_1 = something non-zero
# ...
# etc.

Example 2
Jacobians of user functions wrt the free parameters:

data {
    int <lower=0> K;
    int <lower=0> N;
}
parameters {
    vector[K] <lower=0> mu_mean;
    cov_matrix[K] mu_info;
}
model {
}
generated parameters {
    vector[K] mu_product;
    mu_product = mu_info * mu_mean;
}

Then in R:

data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")

transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_1", "mu_info_1", "mu_info_2","mu_info_3")

transform_model$generated_param_names()
# Result something like:
# c("mu_product_1", "mu_product_2")

jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_product_1 / d mu_mean_1 = derivative wrt the unconstrained parameters
# jac[2, 3] = d mu_product_2 / d mu_info_1 = derivative wrt the unconstrained parameters
# ...
# etc.

There are no derivatives for generated quantities. They are
evaluated with pure double input based on the values of the
parameters. Thatā€™d be a big issue to change.

I guess I donā€™t see why you want to do all of this, so Iā€™m
having a hard time breaking it down into things we could implement.

For (K choose 2), Ben added the integer choose function, so itā€™d be:

K_free = choose(K, 2);

  • Bob

Understood. Perhaps you can keep this possibility on the table for long-term directions.

It would be great if Stan could be a go-to autodifferentiated modeling language for general purpose inference algorithms rather than being so closely tied to a few (high-quality) methods like HMC and BFGS.

Iā€™m not sure what you mean by being tied to L-BFGS
or HMC. Itā€™s really not.

A Stan program translates to a C++ class that provides:

  • log density and derivatives w.r.t. parameters on the unconstrained
    scale

    • with or without additive constants

    • with or without Jacobian adjustment for unconstrained
      to constrained change of variables

  • structured data loading from a base class callback
    (var_context)

  • transform from constrained (var_context) to unconstrained
    (Eigen vector) and query methods for variable names, shapes
    and sizes

  • a way to compute generated quantities from unconstrained
    parameters

Thereā€™s nothing specific to L-BFGS or HMC ā€” both of those
algorithms only need the log density and gradients. Weā€™ve used it
for Metropolis (easy) and ensemble sampling (also easy),
but neither seemed worthwhile adding to Stan given their
poor scaling with dimensionality.

  • Bob

Right. Iā€™m a fan of what Stan has done! Iā€™m just pointing out that for it to be a full-featured modeling and autodiff tool serving all my inference needs, it needs a few more things: Hessians, Jacobians, and derivatives in both the constrained and unconstrained space, and without going into C++.

Mike Betancourt showed me how to get Hessians out with a little RStan hacking. The Jacobians and transform functions are missing, though, and theyā€™re a deal-breaker for my needs. Which is fine! The Stan team has other priorities and is doing amazing work. It was worth an ask. If thereā€™s evidence that Iā€™m not alone, then maybe you can think of this feature at some future date.

Iā€™m very confused about what you actually want. Maybe someone
else understands and can help translate your comments into something
that look like feature requests to me.

Specifically, I canā€™t break your requests down into things we can add
to the Stan language or generated C++ code. Stan is C++ with everything
else being an interface layer on top of it. So new features go into C++
then get exposed through the interfaces (though some of the interfaces
race ahead because we canā€™t keep up in C++ and they can hack stopgaps
in Rā€”like exposing functions and computing Hessians with finite diffs).

If you want all those Hessians, availability of transforms,
evaluations on unconstrained parameters, etc., thatā€™s all possible
if we can understand what you want and break it down into actionable
feature requests. But weā€™re unlikely to ever differentiate through
generated quantities. And if you want hooks into partials in the autodiff
tree, weā€™d have to figure out how to expose that.

  • Bob

The pseudo-code makes sense, though, right? Thatā€™s what Iā€™m asking for. As you say, it would require templating the generated quantities rather than just treating them as doubles. (I think you would also need a writer object and to expose the constraining / unconstraining functions to the model interface.)

This is a point of convention about which weā€™ve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithmsā€¦in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment, which is why so many of these requests are so challenging even if they seem like they shouldnā€™t be. Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really donā€™t have the resources.

betanalpha Developer
December 22
This is a point of convention about which weā€™ve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithmsā€¦in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment,

Yes it was! Thatā€™s why we made it possible to compile
a model and access the unconstrained log density and
gradients.

which is why so many of these requests are so challenging even if they seem like they shouldnā€™t be.

The developer level access is at the C++ level. At the very
least, in R, youā€™d need to write Rcpp code to access the
Stan C++ API. Just like RStan does.

One of the reasons these things are challenging is that we
donā€™t want to just add things willy-nilly, but want to roll
out things we know weā€™ll want to support into the future. Otherwise,
itā€™s very difficult for people to develop against our API. So
like other projects that a lot of other things depend on, we
donā€™t change very quickly.

Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really donā€™t have the resources.

The first effort in this direction is the autodiff paper. Then we
need to write the higher-order autodiff paper when thatā€™s all working
and tested. We could stand to doc the math (and stan) API better.
Itā€™s been a moving target so far, because we didnā€™t originally have it
in its own repo for independent use. Now weā€™re trying to stabilize
the C++ API as well as our interface APIs.

Now, having said all this, I think we should think about exposing
the unconstrained density. And separating the log density into the
constrained log density and Jacobian. And we already plan
to expose all the transforms, but weā€™ll have to think about how
to deal with the built ins.

  • Bob

I might add that Iā€™m really requesting a change to what the compiled C++ model looks like. Once the compiled model object supports Jacobians and accessing generated parameters with readers and writers, then exposing that functionality to R or Python will be relatively easy.

When using the Stan math library in C++, Iā€™ve found myself writing (and debugging) quite a lot of boilerplate thatā€™s already been done to very high standards in the core Stan model. Specifically, Iā€™m taking about the boilerplate of packing named parameters in and out of a single long vector and taking that vector in and out of the unconstrained space. You can easily take advantage of Stanā€™s implementation of this boilerplate for gradients and Hessians of log probs (and hence basically any function) with respect to the unconstrained space in C++, and hence in R or Python. But even in C++, right now, if you want Jacobian matrices of any kind, you have to roll all your own parameter wrapping code.