Hey all,
I’m having trouble getting a hang of the autodiff capabilities of stan math.
I’m attempting to derive the gradient and the Hessian of the function f(L) = Tr(LL^{T}A), where A is a known matrix and L is a lower triangular matrix. I’m using the log-cholesky parameterization here for convenience because ultimately I would like to solve an optimization problem (that is much more complex than this) with a unique solution.
Below is my code to compute the gradient:
// [[Rcpp::export]]
Eigen::VectorXd log_cholesky_grad(const Eigen::VectorXd& l_params, const Eigen::MatrixXd& A) {
int n = A.rows(); // assuming A is square
// Function to calculate Tr(LL^T A) with log-Cholesky parameterization
auto func = [&](const auto& l_params) -> auto {
// Use Eigen matrix with stan::math::var type to support autodiff
Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> L(n, n);
L.setZero();
// Fill the lower triangular matrix L with the log-Cholesky parameterization
int index = 0;
for (int i = 0; i < n; ++i) {
for (int j = 0; j <= i; ++j) {
if (i == j) {
L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
} else {
L(i, j) = l_params(index); // off-diagonal elements remain the same
}
index++;
}
}
// Compute the trace of L * L^T * A
Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> LLT = L * L.transpose();
stan::math::var result = (LLT.cwiseProduct(A)).sum(); // Tr(LL^T A)
return result;
};
// Prepare storage for gradient
double fx;
Eigen::VectorXd grad_fx;
// Compute gradient
stan::math::gradient(func, l_params, fx, grad_fx);
return grad_fx; // Return the gradient with respect to log-Cholesky parameters
}
This code works perfectly. So, to modify this function to compute the Hessian, I modified the function above to the following:
// [[Rcpp::export]]
Eigen::MatrixXd log_cholesky_H(const Eigen::VectorXd& l_params, const Eigen::MatrixXd& A) {
int n = A.rows(); // assuming A is square
// Function to calculate Tr(LL^T A) with log-Cholesky parameterization
auto func = [&](const auto& l_params) -> auto {
// Use Eigen matrix with stan::math::var type to support autodiff
Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> L(n, n);
L.setZero();
// Fill the lower triangular matrix L with the log-Cholesky parameterization
int index = 0;
for (int i = 0; i < n; ++i) {
for (int j = 0; j <= i; ++j) {
if (i == j) {
L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
} else {
L(i, j) = l_params(index); // off-diagonal elements remain the same
}
index++;
}
}
// Compute the trace of L * L^T * A
Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> LLT = L * L.transpose();
stan::math::var result = (LLT.cwiseProduct(A)).sum(); // Tr(LL^T A)
return result;
};
// Prepare storage for gradient
double fx;
Eigen::VectorXd grad_fx;
Eigen::MatrixXd hess_fx;
// Compute gradient
stan::math::hessian(func, l_params, fx, grad_fx, hess_fx);
return hess_fx; // Return the gradient with respect to log-Cholesky parameters
}
However, for some reason, whenever I try to compile this code. I get the error:
/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/mix/functor/hessian.hpp:50:10: error: assigning to 'double' from incompatible type 'stan::math::var' (aka 'var_value<double>')
fx = f(x);
^~~~
poly_grad.cpp:100:15: note: in instantiation of function template specialization 'stan::math::hessian<(lambda at poly_grad.cpp:70:15)>' requested here
stan::math::hessian(func, l_params, fx, grad_fx, hess_fx);
^
poly_grad.cpp:80:19: error: no viable overloaded '='
L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I’m guessing that this is because for some reason the hessian requires a function template but the gradient will work with a function that returns a double? If that’s the case, I’m not clear how to fix this. Could anyone guide me?
By the way, for reproducibility my package is here: GitHub - eweine/stanAD
Thanks!