Feature request: Jacobians of unconstraining transform

bgoodri · December 20, 2016, 5:21am

For those cases, input the unconstrained thing as an argument to the function in generated quantities, to_var() it internally, and transform.

rgiordan · December 20, 2016, 6:42pm

Yes, but how do I access the unconstrained variables or unconstraining functions in a Stan script? As far as I can tell, those are only available as methods of writer objects. Sorry if we’re talking past each other.

bgoodri · December 20, 2016, 7:15pm

If you declare things with constraints in the parameters block, then there is currently now way to access the unconstrained parameters from within a Stan program. But you can just declare real, vector, etc. without constraints in the parameters block and then transform them yourself into the constrained space. The only thing is that you have to deal the Jacobians manually.

Bob_Carpenter · December 20, 2016, 7:34pm

The current density function looks like this:

template <bool propto, bool use_jacobian>
T log_prob(vector<T>& params_unc);

Data an transforme data are stored as member variables in
the class. We didn’t get const correctness right the first
time around—that should be const in both ways, but isn’t.

We could add

template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
                        T& jacobian,
                        T& log_density_constrained) const;

and one that doesn’t bother with the Jacobian (don’t know
how to name the second argument here — it’s the constrained
density value without the Jacobian, given the unconstrained
parameters in):

template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
                        T& lpdf_constrained) const;

I’m less clear on how to present the constrained log density and make it
easy to use with gradients because we can only read structured data like
this with a var_context:

template <bool propto>
void lpdf_constrained(const var_context& params_con,
                      double& lpdf_constrained,
                      vector<double>& grad_lpdf);

and we could do one without the gradient, too.

Bob

rgiordan · December 20, 2016, 7:43pm

@bgoodri, that would be fine. But

“transform them yourself into the constrained space”

How does one do that in a Stan model? Of course, I can take the Cholesky decomposition myself, &c, but it would be important that I be using the exact same constraining function that Stan uses, since I would be using the result to transform Stan’s unconstrained derivatives.

I’m really sorry if I’m being slow here.

rgiordan · December 20, 2016, 7:47pm

@Bob_Carpenter, that is a good idea. But I really do want to emphasize that my request is the ability to get Jacobians out of Stan models, and to do so using the underlying constraining / unconstraining functions. The ability to get constrained derivatives of the log probability is just an example use case.

For constraining functions that are not one-to-one (e.g. simplexes), the unconstrained derivatives might not even be well-defined. And as you pointed out, the Jacobians might be huge, and the user might not actually need to transform every single derivative in the gradient. For these reasons it might be better to let the user do it themselves.

Bob_Carpenter · December 20, 2016, 7:50pm

They’re all documented in a chapter of the manual near the end.
But the Cholesky ones in particular are tedious and error prone
to implement.

Bob

Bob_Carpenter · December 20, 2016, 7:53pm

Maybe if you could propose a function we could add?
I’m still not sure what you want.

The simplex transform is one to one. With N constrained
dimensions there are only N-1 unconstrained dimensions.

Bob

rgiordan · December 20, 2016, 7:57pm

Right, I meant it’s not 1:1 between R^N and R^{N-1}. The constrained gradient would live in R^N. But we’re on the same page.

Yes, I can implement the unconstraining functions myself. I mean, I’m using the C++ libraries directly myself to get Jacobians. The point of this request is to make it easy (and foolproof) to use the full suite of Stan’s autodiff tools using on the high-level language.

From my post above, here’s what I think would be the ideal solution:

“Here’s a solution that could work. Suppose there were an option to calculate the Jacobian of parameters defined in the generated quantities block. The columns of the Jacobian matrix could be referenced with param_names, and you could define something like generated_param_names to reference the rows, just like you do with unconstrained_param_names. Then the user could make a separate stan program and copy the transormations they need into the generated quantities block, omit the rest of the model, and caculate the Jacobians they need.”

“In fact, if you exposed the functions that are used to convert an unconstrained parameter to a constrained parameter (maybe they already are exposed?) then this solution would also solve my original, more narrowly scoped feature request.”

Bob_Carpenter · December 20, 2016, 8:03pm

Right — it’s only one-one between N-simplexes and R^{N-1}.

Would you mind trying to convert the prose into function(s)
declarations you’d like to see implemented?

Bob

rgiordan · December 20, 2016, 8:29pm

Sure, here’s some pseudo-stan-code.

Example one
Jacobians of the unconstraining functions:

// transform_model.stan:
data {
    int <lower=0> K;
}
parameters {
    vector[K] mu_mean;
    matrix[K, K] mu_info;
}
model {
}
generated parameters {
    // I'm not sure how best to get K_free := K * (K + 1) / 2 here.
    Vector[K_free] mu_info_free;
    Vector[K] mu_mean_free;
    mu_info_free = unconstrain_cov_marix(mu_info);
    mu_mean_free = unconstrain_vector(mu_mean, lower=0);
}

Then in R:

# Right now I get a stan fit object with stan_model() then sampling(iter=1),
# but whatever would work.
data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")

transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_2", "mu_info_1_1", "mu_info_2_1", "mu_info_1_2", "mu_info_2_2")

transform_model$generated_param_names()
# Result something like:
# c("mu_info_free_1", "mu_info_free_2", "mu_info_free_3", "mu_mean_free_1","mu_mean_free_2")

jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_info_free_1 / d mu_mean_1 = 0
# jac[1, 2] = d mu_info_free_1 / d mu_mean_2 = 0
# jac[1, 3] = d mu_info_free_1 / d mu_info_1_1 = something non-zero
# ...
# jac[4, 1] = d mu_mean_free_1 / d mu_mean_1 = something non-zero
# ...
# etc.

Example 2
Jacobians of user functions wrt the free parameters:

data {
    int <lower=0> K;
    int <lower=0> N;
}
parameters {
    vector[K] <lower=0> mu_mean;
    cov_matrix[K] mu_info;
}
model {
}
generated parameters {
    vector[K] mu_product;
    mu_product = mu_info * mu_mean;
}

Then in R:

data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")

transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_1", "mu_info_1", "mu_info_2","mu_info_3")

transform_model$generated_param_names()
# Result something like:
# c("mu_product_1", "mu_product_2")

jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_product_1 / d mu_mean_1 = derivative wrt the unconstrained parameters
# jac[2, 3] = d mu_product_2 / d mu_info_1 = derivative wrt the unconstrained parameters
# ...
# etc.

Bob_Carpenter · December 20, 2016, 9:10pm

There are no derivatives for generated quantities. They are
evaluated with pure double input based on the values of the
parameters. That’d be a big issue to change.

I guess I don’t see why you want to do all of this, so I’m
having a hard time breaking it down into things we could implement.

For (K choose 2), Ben added the integer choose function, so it’d be:

K_free = choose(K, 2);

Bob

rgiordan · December 20, 2016, 10:20pm

Understood. Perhaps you can keep this possibility on the table for long-term directions.

It would be great if Stan could be a go-to autodifferentiated modeling language for general purpose inference algorithms rather than being so closely tied to a few (high-quality) methods like HMC and BFGS.

Bob_Carpenter · December 20, 2016, 11:35pm

I’m not sure what you mean by being tied to L-BFGS
or HMC. It’s really not.

A Stan program translates to a C++ class that provides:

log density and derivatives w.r.t. parameters on the unconstrained
scale
- with or without additive constants
- with or without Jacobian adjustment for unconstrained
  to constrained change of variables
structured data loading from a base class callback
(var_context)
transform from constrained (var_context) to unconstrained
(Eigen vector) and query methods for variable names, shapes
and sizes
a way to compute generated quantities from unconstrained
parameters

There’s nothing specific to L-BFGS or HMC — both of those
algorithms only need the log density and gradients. We’ve used it
for Metropolis (easy) and ensemble sampling (also easy),
but neither seemed worthwhile adding to Stan given their
poor scaling with dimensionality.

Bob

rgiordan · December 21, 2016, 7:34pm

Right. I’m a fan of what Stan has done! I’m just pointing out that for it to be a full-featured modeling and autodiff tool serving all my inference needs, it needs a few more things: Hessians, Jacobians, and derivatives in both the constrained and unconstrained space, and without going into C++.

Mike Betancourt showed me how to get Hessians out with a little RStan hacking. The Jacobians and transform functions are missing, though, and they’re a deal-breaker for my needs. Which is fine! The Stan team has other priorities and is doing amazing work. It was worth an ask. If there’s evidence that I’m not alone, then maybe you can think of this feature at some future date.

Bob_Carpenter · December 21, 2016, 7:50pm

I’m very confused about what you actually want. Maybe someone
else understands and can help translate your comments into something
that look like feature requests to me.

Specifically, I can’t break your requests down into things we can add
to the Stan language or generated C++ code. Stan is C++ with everything
else being an interface layer on top of it. So new features go into C++
then get exposed through the interfaces (though some of the interfaces
race ahead because we can’t keep up in C++ and they can hack stopgaps
in R—like exposing functions and computing Hessians with finite diffs).

If you want all those Hessians, availability of transforms,
evaluations on unconstrained parameters, etc., that’s all possible
if we can understand what you want and break it down into actionable
feature requests. But we’re unlikely to ever differentiate through
generated quantities. And if you want hooks into partials in the autodiff
tree, we’d have to figure out how to expose that.

Bob

rgiordan · December 21, 2016, 7:54pm

The pseudo-code makes sense, though, right? That’s what I’m asking for. As you say, it would require templating the generated quantities rather than just treating them as doubles. (I think you would also need a writer object and to expose the constraining / unconstraining functions to the model interface.)

betanalpha · December 22, 2016, 7:51am

This is a point of convention about which we’ve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithms…in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment, which is why so many of these requests are so challenging even if they seem like they shouldn’t be. Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really don’t have the resources.

Bob_Carpenter · December 22, 2016, 2:15pm

betanalpha Developer
December 22
This is a point of convention about which we’ve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithms…in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment,

Yes it was! That’s why we made it possible to compile
a model and access the unconstrained log density and
gradients.

which is why so many of these requests are so challenging even if they seem like they shouldn’t be.

The developer level access is at the C++ level. At the very
least, in R, you’d need to write Rcpp code to access the
Stan C++ API. Just like RStan does.

One of the reasons these things are challenging is that we
don’t want to just add things willy-nilly, but want to roll
out things we know we’ll want to support into the future. Otherwise,
it’s very difficult for people to develop against our API. So
like other projects that a lot of other things depend on, we
don’t change very quickly.

Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really don’t have the resources.

The first effort in this direction is the autodiff paper. Then we
need to write the higher-order autodiff paper when that’s all working
and tested. We could stand to doc the math (and stan) API better.
It’s been a moving target so far, because we didn’t originally have it
in its own repo for independent use. Now we’re trying to stabilize
the C++ API as well as our interface APIs.

Now, having said all this, I think we should think about exposing
the unconstrained density. And separating the log density into the
constrained log density and Jacobian. And we already plan
to expose all the transforms, but we’ll have to think about how
to deal with the built ins.

Bob

rgiordan · December 23, 2016, 11:44pm

I might add that I’m really requesting a change to what the compiled C++ model looks like. Once the compiled model object supports Jacobians and accessing generated parameters with readers and writers, then exposing that functionality to R or Python will be relatively easy.

When using the Stan math library in C++, I’ve found myself writing (and debugging) quite a lot of boilerplate that’s already been done to very high standards in the core Stan model. Specifically, I’m taking about the boilerplate of packing named parameters in and out of a single long vector and taking that vector in and out of the unconstrained space. You can easily take advantage of Stan’s implementation of this boilerplate for gradients and Hessians of log probs (and hence basically any function) with respect to the unconstrained space in C++, and hence in R or Python. But even in C++, right now, if you want Jacobian matrices of any kind, you have to roll all your own parameter wrapping code.

Topic		Replies	Views
Optimize Jacobians and make transforms accessible from language Developers	0	642	October 16, 2016
Linearly scaled, unconstrained parameters Developers	18	1430	October 22, 2018
How to impose constraint on a parameter transformed from other parameters Modeling specification	5	734	August 10, 2021
Jacobian adjustments explained simply Publicity jacobian-adjustment	3	1837	April 21, 2021
Jacobian adjustment in the acceptance probability Algorithms	11	3984	September 4, 2017

Feature request: Jacobians of unconstraining transform

Related topics