So question regarding the matrix autodiff. Say I have a function that’s dependent on some parameters. Let’s say we have a squared exponential kernel:

\sigma^2 exp(\frac{d(x, x')}{2l^2}), with for example the partial derivative WRT \sigma:

\frac{d}{d\sigma} = 2\sigma exp(\frac{d(x,x')}{2l^2}).

This will have a different value of the lower triangular, for example if the covariance matrix is 3x3 there are 3(2)/2 = 3 elements. So what’s the Jacobian look like for this function? If we have some function f, that’s dependent on x, \sigma and l so that we have f(x, \sigma, l), then we can write out the Jacobian as follows:

J_f = [\frac{df}{dx} \frac{df}{d\sigma} \frac{df}{dl}],

But this function exists on a matrix, so really, we have these evaluations at each position in the matrix. So really, J_f above will have \frac{n(n+1)}{2} evaluations, for a symmetric matrix (the diag and either the lower or upper triangular).

So, for the specialization, do I need to calculate the partial derivative with respect to each parameter, at each point in the covariance matrix? If so, that brings me to another question. The current `rev`

mode implementation of this covariance function looks like this, just the part where we’re chaining the partial derivatives:

```
virtual void chain() {
double adjl = 0;
double adjsigma = 0;
for (size_t i = 0; i < size_ltri_; ++i) {
vari *el_low = cov_lower_[i];
double prod_add = el_low->adj_ * el_low->val_;
adjl += prod_add * dist_[i];
adjsigma += prod_add;
}
for (size_t i = 0; i < size_; ++i) {
vari *el = cov_diag_[i];
adjsigma += el->adj_ * el->val_;
}
l_vari_->adj_ += adjl / (l_d_ * l_d_ * l_d_);
sigma_vari_->adj_ += adjsigma * 2 / sigma_d_;
}
```

But here’s what I noticed. If we need different gradient evaluations at each point in the covariance matrix, for say, sigma, why am I doing this operation: `adjsigma += prod_add;`

. The partial deritive for a matrix, w.r.t. \sigma is only one value? This will then be the same value at all points in the lower triangular of the covariance matrix, correct? Am I missing something?

Alternatively shouldn’t initialize `sigma_vari_`

as a vector of `vari`

that has a different value at each point in the covariance martix?

Anything incorrect in my understanding?