Matrix Autodiff - Implementation Question

anon79882417 · March 16, 2019, 2:28am

So question regarding the matrix autodiff. Say I have a function that’s dependent on some parameters. Let’s say we have a squared exponential kernel:

\sigma^2 exp(\frac{d(x, x')}{2l^2}), with for example the partial derivative WRT \sigma:

\frac{d}{d\sigma} = 2\sigma exp(\frac{d(x,x')}{2l^2}).

This will have a different value of the lower triangular, for example if the covariance matrix is 3x3 there are 3(2)/2 = 3 elements. So what’s the Jacobian look like for this function? If we have some function f, that’s dependent on x, \sigma and l so that we have f(x, \sigma, l), then we can write out the Jacobian as follows:

J_f = [\frac{df}{dx} \frac{df}{d\sigma} \frac{df}{dl}],

But this function exists on a matrix, so really, we have these evaluations at each position in the matrix. So really, J_f above will have \frac{n(n+1)}{2} evaluations, for a symmetric matrix (the diag and either the lower or upper triangular).

So, for the specialization, do I need to calculate the partial derivative with respect to each parameter, at each point in the covariance matrix? If so, that brings me to another question. The current rev mode implementation of this covariance function looks like this, just the part where we’re chaining the partial derivatives:

  virtual void chain() {
    double adjl = 0;
    double adjsigma = 0;

    for (size_t i = 0; i < size_ltri_; ++i) {
      vari *el_low = cov_lower_[i];
      double prod_add = el_low->adj_ * el_low->val_;
      adjl += prod_add * dist_[i];
      adjsigma += prod_add;
    }
    for (size_t i = 0; i < size_; ++i) {
      vari *el = cov_diag_[i];
      adjsigma += el->adj_ * el->val_;
    }
    l_vari_->adj_ += adjl / (l_d_ * l_d_ * l_d_);
    sigma_vari_->adj_ += adjsigma * 2 / sigma_d_;
  }

But here’s what I noticed. If we need different gradient evaluations at each point in the covariance matrix, for say, sigma, why am I doing this operation: adjsigma += prod_add;. The partial deritive for a matrix, w.r.t. \sigma is only one value? This will then be the same value at all points in the lower triangular of the covariance matrix, correct? Am I missing something?

Alternatively shouldn’t initialize sigma_vari_ as a vector of vari that has a different value at each point in the covariance martix?

Anything incorrect in my understanding?

avehtari · March 16, 2019, 10:08pm

Yes. But you probably don’t want to store all those at the same time.

Can you add a bit more math?

anon79882417 · March 23, 2019, 3:03am

I was doing something dumb with includes… I’ve solved this issue. It was poor programming. Now I can start getting the reverse mode in

Topic		Replies	Views
Question about autodiff for potential GP covfun implementation Developers	3	716	July 30, 2018
Derivatives of kernel with respect to columns of X General	7	1092	October 16, 2019
F: R^N -> R^M - Jacobian for M >> N - is forward-mode more efficient? Algorithms	25	2725	June 30, 2017
Adj_jac_apply Developers	37	2455	April 11, 2020
Paper on autodiff for implicit function Publicity	0	635	December 30, 2021

Matrix Autodiff - Implementation Question

Related topics