For those cases, input the unconstrained thing as an argument to the function in generated quantities, to_var() it internally, and transform.
Yes, but how do I access the unconstrained variables or unconstraining functions in a Stan script? As far as I can tell, those are only available as methods of writer
objects. Sorry if weāre talking past each other.
If you declare things with constraints in the parameters block, then there is currently now way to access the unconstrained parameters from within a Stan program. But you can just declare real, vector, etc. without constraints in the parameters block and then transform them yourself into the constrained space. The only thing is that you have to deal the Jacobians manually.
The current density function looks like this:
template <bool propto, bool use_jacobian>
T log_prob(vector<T>& params_unc);
Data an transforme data are stored as member variables in
the class. We didnāt get const correctness right the first
time aroundāthat should be const in both ways, but isnāt.
We could add
template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
T& jacobian,
T& log_density_constrained) const;
and one that doesnāt bother with the Jacobian (donāt know
how to name the second argument here ā itās the constrained
density value without the Jacobian, given the unconstrained
parameters in):
template <bool propto>
void lpdf_unconstrained(const vector<T>& params_unc,
T& lpdf_constrained) const;
Iām less clear on how to present the constrained log density and make it
easy to use with gradients because we can only read structured data like
this with a var_context:
template <bool propto>
void lpdf_constrained(const var_context& params_con,
double& lpdf_constrained,
vector<double>& grad_lpdf);
and we could do one without the gradient, too.
- Bob
@bgoodri, that would be fine. But
ātransform them yourself into the constrained spaceā
How does one do that in a Stan model? Of course, I can take the Cholesky decomposition myself, &c, but it would be important that I be using the exact same constraining function that Stan uses, since I would be using the result to transform Stanās unconstrained derivatives.
Iām really sorry if Iām being slow here.
@Bob_Carpenter, that is a good idea. But I really do want to emphasize that my request is the ability to get Jacobians out of Stan models, and to do so using the underlying constraining / unconstraining functions. The ability to get constrained derivatives of the log probability is just an example use case.
For constraining functions that are not one-to-one (e.g. simplexes), the unconstrained derivatives might not even be well-defined. And as you pointed out, the Jacobians might be huge, and the user might not actually need to transform every single derivative in the gradient. For these reasons it might be better to let the user do it themselves.
Theyāre all documented in a chapter of the manual near the end.
But the Cholesky ones in particular are tedious and error prone
to implement.
- Bob
Maybe if you could propose a function we could add?
Iām still not sure what you want.
The simplex transform is one to one. With N constrained
dimensions there are only N-1 unconstrained dimensions.
- Bob
Right, I meant itās not 1:1 between R^N and R^{N-1}. The constrained gradient would live in R^N. But weāre on the same page.
Yes, I can implement the unconstraining functions myself. I mean, Iām using the C++ libraries directly myself to get Jacobians. The point of this request is to make it easy (and foolproof) to use the full suite of Stanās autodiff tools using on the high-level language.
From my post above, hereās what I think would be the ideal solution:
āHereās a solution that could work. Suppose there were an option to calculate the Jacobian of parameters defined in the generated quantities block. The columns of the Jacobian matrix could be referenced with param_names, and you could define something like generated_param_names to reference the rows, just like you do with unconstrained_param_names. Then the user could make a separate stan program and copy the transormations they need into the generated quantities block, omit the rest of the model, and caculate the Jacobians they need.ā
āIn fact, if you exposed the functions that are used to convert an unconstrained parameter to a constrained parameter (maybe they already are exposed?) then this solution would also solve my original, more narrowly scoped feature request.ā
Right ā itās only one-one between N-simplexes and R^{N-1}.
Would you mind trying to convert the prose into function(s)
declarations youād like to see implemented?
- Bob
Sure, hereās some pseudo-stan-code.
Example one
Jacobians of the unconstraining functions:
// transform_model.stan:
data {
int <lower=0> K;
}
parameters {
vector[K] mu_mean;
matrix[K, K] mu_info;
}
model {
}
generated parameters {
// I'm not sure how best to get K_free := K * (K + 1) / 2 here.
Vector[K_free] mu_info_free;
Vector[K] mu_mean_free;
mu_info_free = unconstrain_cov_marix(mu_info);
mu_mean_free = unconstrain_vector(mu_mean, lower=0);
}
Then in R:
# Right now I get a stan fit object with stan_model() then sampling(iter=1),
# but whatever would work.
data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")
transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_2", "mu_info_1_1", "mu_info_2_1", "mu_info_1_2", "mu_info_2_2")
transform_model$generated_param_names()
# Result something like:
# c("mu_info_free_1", "mu_info_free_2", "mu_info_free_3", "mu_mean_free_1","mu_mean_free_2")
jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_info_free_1 / d mu_mean_1 = 0
# jac[1, 2] = d mu_info_free_1 / d mu_mean_2 = 0
# jac[1, 3] = d mu_info_free_1 / d mu_info_1_1 = something non-zero
# ...
# jac[4, 1] = d mu_mean_free_1 / d mu_mean_1 = something non-zero
# ...
# etc.
Example 2
Jacobians of user functions wrt the free parameters:
data {
int <lower=0> K;
int <lower=0> N;
}
parameters {
vector[K] <lower=0> mu_mean;
cov_matrix[K] mu_info;
}
model {
}
generated parameters {
vector[K] mu_product;
mu_product = mu_info * mu_mean;
}
Then in R:
data_dat <- list(K=2, mu_mean=c(0.5, 0.7), mu_info=matrix(c(1, 0.1, 0.1, 1), 2, 2))
transform_model <- get_stan_fit_object("transform_model.stan")
transform_model$unconstrained_param_names()
# Result as now something like
# c("mu_mean_1", "mu_mean_1", "mu_info_1", "mu_info_2","mu_info_3")
transform_model$generated_param_names()
# Result something like:
# c("mu_product_1", "mu_product_2")
jac <- transform_model$generated_param_jacobian()
# Results a matrix containing, e.g.
# jac[1, 1] = d mu_product_1 / d mu_mean_1 = derivative wrt the unconstrained parameters
# jac[2, 3] = d mu_product_2 / d mu_info_1 = derivative wrt the unconstrained parameters
# ...
# etc.
There are no derivatives for generated quantities. They are
evaluated with pure double input based on the values of the
parameters. Thatād be a big issue to change.
I guess I donāt see why you want to do all of this, so Iām
having a hard time breaking it down into things we could implement.
For (K choose 2), Ben added the integer choose function, so itād be:
K_free = choose(K, 2);
- Bob
Understood. Perhaps you can keep this possibility on the table for long-term directions.
It would be great if Stan could be a go-to autodifferentiated modeling language for general purpose inference algorithms rather than being so closely tied to a few (high-quality) methods like HMC and BFGS.
Iām not sure what you mean by being tied to L-BFGS
or HMC. Itās really not.
A Stan program translates to a C++ class that provides:
-
log density and derivatives w.r.t. parameters on the unconstrained
scale-
with or without additive constants
-
with or without Jacobian adjustment for unconstrained
to constrained change of variables
-
-
structured data loading from a base class callback
(var_context) -
transform from constrained (var_context) to unconstrained
(Eigen vector) and query methods for variable names, shapes
and sizes -
a way to compute generated quantities from unconstrained
parameters
Thereās nothing specific to L-BFGS or HMC ā both of those
algorithms only need the log density and gradients. Weāve used it
for Metropolis (easy) and ensemble sampling (also easy),
but neither seemed worthwhile adding to Stan given their
poor scaling with dimensionality.
- Bob
Right. Iām a fan of what Stan has done! Iām just pointing out that for it to be a full-featured modeling and autodiff tool serving all my inference needs, it needs a few more things: Hessians, Jacobians, and derivatives in both the constrained and unconstrained space, and without going into C++.
Mike Betancourt showed me how to get Hessians out with a little RStan hacking. The Jacobians and transform functions are missing, though, and theyāre a deal-breaker for my needs. Which is fine! The Stan team has other priorities and is doing amazing work. It was worth an ask. If thereās evidence that Iām not alone, then maybe you can think of this feature at some future date.
Iām very confused about what you actually want. Maybe someone
else understands and can help translate your comments into something
that look like feature requests to me.
Specifically, I canāt break your requests down into things we can add
to the Stan language or generated C++ code. Stan is C++ with everything
else being an interface layer on top of it. So new features go into C++
then get exposed through the interfaces (though some of the interfaces
race ahead because we canāt keep up in C++ and they can hack stopgaps
in Rālike exposing functions and computing Hessians with finite diffs).
If you want all those Hessians, availability of transforms,
evaluations on unconstrained parameters, etc., thatās all possible
if we can understand what you want and break it down into actionable
feature requests. But weāre unlikely to ever differentiate through
generated quantities. And if you want hooks into partials in the autodiff
tree, weād have to figure out how to expose that.
- Bob
The pseudo-code makes sense, though, right? Thatās what Iām asking for. As you say, it would require templating the generated quantities rather than just treating them as doubles. (I think you would also need a writer object and to expose the constraining / unconstraining functions to the model interface.)
This is a point of convention about which weāve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithmsā¦in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment, which is why so many of these requests are so challenging even if they seem like they shouldnāt be. Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really donāt have the resources.
betanalpha Developer
December 22
This is a point of convention about which weāve struggled with in the past. The Stan Math Library is a richly-featured C++ library that makes it very easy to design, experiment, and release powerful, state-of-the-art algorithmsā¦in C++. The interfaces are meant to expose these implementations to various other environments, not the entire math library. In other words, Stan was never meath to facilitate the design and experimentation of algorithms in R or Python or your other favorite environment,
Yes it was! Thatās why we made it possible to compile
a model and access the unconstrained log density and
gradients.
which is why so many of these requests are so challenging even if they seem like they shouldnāt be.
The developer level access is at the C++ level. At the very
least, in R, youād need to write Rcpp code to access the
Stan C++ API. Just like RStan does.
One of the reasons these things are challenging is that we
donāt want to just add things willy-nilly, but want to roll
out things we know weāll want to support into the future. Otherwise,
itās very difficult for people to develop against our API. So
like other projects that a lot of other things depend on, we
donāt change very quickly.
Indeed, trying to support the exposition of the entire math library would be a huge development and maintenance effort for which we really donāt have the resources.
The first effort in this direction is the autodiff paper. Then we
need to write the higher-order autodiff paper when thatās all working
and tested. We could stand to doc the math (and stan) API better.
Itās been a moving target so far, because we didnāt originally have it
in its own repo for independent use. Now weāre trying to stabilize
the C++ API as well as our interface APIs.
Now, having said all this, I think we should think about exposing
the unconstrained density. And separating the log density into the
constrained log density and Jacobian. And we already plan
to expose all the transforms, but weāll have to think about how
to deal with the built ins.
- Bob
I might add that Iām really requesting a change to what the compiled C++ model looks like. Once the compiled model object supports Jacobians and accessing generated parameters with readers and writers, then exposing that functionality to R or Python will be relatively easy.
When using the Stan math library in C++, Iāve found myself writing (and debugging) quite a lot of boilerplate thatās already been done to very high standards in the core Stan model. Specifically, Iām taking about the boilerplate of packing named parameters in and out of a single long vector and taking that vector in and out of the unconstrained space. You can easily take advantage of Stanās implementation of this boilerplate for gradients and Hessians of log probs (and hence basically any function) with respect to the unconstrained space in C++, and hence in R or Python. But even in C++, right now, if you want Jacobian matrices of any kind, you have to roll all your own parameter wrapping code.