Issue getting Hessian with stan math

Hey all,

I’m having trouble getting a hang of the autodiff capabilities of stan math.

I’m attempting to derive the gradient and the Hessian of the function f(L) = Tr(LL^{T}A), where A is a known matrix and L is a lower triangular matrix. I’m using the log-cholesky parameterization here for convenience because ultimately I would like to solve an optimization problem (that is much more complex than this) with a unique solution.

Below is my code to compute the gradient:

// [[Rcpp::export]]
Eigen::VectorXd log_cholesky_grad(const Eigen::VectorXd& l_params, const Eigen::MatrixXd& A) {
  int n = A.rows(); // assuming A is square

  // Function to calculate Tr(LL^T A) with log-Cholesky parameterization
  auto func = [&](const auto& l_params) -> auto {
    // Use Eigen matrix with stan::math::var type to support autodiff
    Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> L(n, n);
    L.setZero();

    // Fill the lower triangular matrix L with the log-Cholesky parameterization
    int index = 0;
    for (int i = 0; i < n; ++i) {
      for (int j = 0; j <= i; ++j) {
        if (i == j) {
          L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
        } else {
          L(i, j) = l_params(index); // off-diagonal elements remain the same
        }
        index++;
      }
    }

    // Compute the trace of L * L^T * A
    Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> LLT = L * L.transpose();
    stan::math::var result = (LLT.cwiseProduct(A)).sum(); // Tr(LL^T A)
    return result;
  };

  // Prepare storage for gradient
  double fx;
  Eigen::VectorXd grad_fx;

  // Compute gradient
  stan::math::gradient(func, l_params, fx, grad_fx);

  return grad_fx; // Return the gradient with respect to log-Cholesky parameters
}

This code works perfectly. So, to modify this function to compute the Hessian, I modified the function above to the following:

// [[Rcpp::export]]
Eigen::MatrixXd log_cholesky_H(const Eigen::VectorXd& l_params, const Eigen::MatrixXd& A) {
  int n = A.rows(); // assuming A is square

  // Function to calculate Tr(LL^T A) with log-Cholesky parameterization
  auto func = [&](const auto& l_params) -> auto {
    // Use Eigen matrix with stan::math::var type to support autodiff
    Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> L(n, n);
    L.setZero();

    // Fill the lower triangular matrix L with the log-Cholesky parameterization
    int index = 0;
    for (int i = 0; i < n; ++i) {
      for (int j = 0; j <= i; ++j) {
        if (i == j) {
          L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
        } else {
          L(i, j) = l_params(index); // off-diagonal elements remain the same
        }
        index++;
      }
    }

    // Compute the trace of L * L^T * A
    Eigen::Matrix<stan::math::var, Eigen::Dynamic, Eigen::Dynamic> LLT = L * L.transpose();
    stan::math::var result = (LLT.cwiseProduct(A)).sum(); // Tr(LL^T A)
    return result;
  };

  // Prepare storage for gradient
  double fx;
  Eigen::VectorXd grad_fx;
  Eigen::MatrixXd hess_fx;

  // Compute gradient
  stan::math::hessian(func, l_params, fx, grad_fx, hess_fx);

  return hess_fx; // Return the gradient with respect to log-Cholesky parameters
}

However, for some reason, whenever I try to compile this code. I get the error:

  /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/mix/functor/hessian.hpp:50:10: error: assigning to 'double' from incompatible type 'stan::math::var' (aka 'var_value<double>')
       fx = f(x);
            ^~~~
   poly_grad.cpp:100:15: note: in instantiation of function template specialization 'stan::math::hessian<(lambda at poly_grad.cpp:70:15)>' requested here
     stan::math::hessian(func, l_params, fx, grad_fx, hess_fx);
                 ^
   poly_grad.cpp:80:19: error: no viable overloaded '='
             L(i, j) = stan::math::exp(l_params(index)); // exponentiate diagonal elements
             ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I’m guessing that this is because for some reason the hessian requires a function template but the gradient will work with a function that returns a double? If that’s the case, I’m not clear how to fix this. Could anyone guide me?

By the way, for reproducibility my package is here: GitHub - eweine/stanAD

Thanks!

1 Like

We’re all just getting back from StanCon, so have been a bit slow to catch up on the mailing lists. This is one that @stevebronder should be able to answer.

For gradients, you need an instantiation that accepts a var argument and returns a var. For Hessian, you need an instantiation that accepts fvar<var> and returns an fvar<var>. So if you want to write custom gradients for the var, you still need at least one version templated enough to accept fvar<var>. You can also precompute gradients for the fvar.

The error arises in trying to assign an fvar<var> (type of lparams(index)) to a var (type of L(i, j)). The warning message from C++ should have told you what it thought the left-hand and right-hand side types are in the error somewhere (they can be long and hard to parse).

What you need to do is rewrite this so that the type of L is the same type as the argument lparams.

Also, if you want more efficiency, you might want to define args and returns with move semantics. But I’d get it working in the simpler case first. One thing you can do is copy our built-in function definitions.

I asked ChatGPT how to do this, since I can never remember how to use decltype:

# include <typetraits>
...
using ScalarType = typename std::decay_t<decltype(x)>::Scalar;

Then you would declare

Eigen::Matrix<ScalarType, Eigen::Dynamic, Eigen::Dynamic> L(n, n);

Or you can just declare the argument rather than relying on auto and then just reuse the scalar type you use for the declaration. The function would then be templated.

Alternatively, if you’re working in R or Python or Julia, you can directly access a Stan program’s Hessians, gradients, and transforms through BridgeStan: BridgeStan – efficient, in-memory access to the methods of a Stan model — BridgeStan documentation

You can hack this to compute any function of the parameters by just using target += in a Stan model—nothing’s checking that it’s a proper log density function. You can include or exclude the Jacobian for the constraining transforms if there are constrained parameters.