Quick question hopefully. If this is documented somewhere and I’m just missing it lemme know.

I’ve been reading through the fwd mode autodiff code to figure out what it does. Does someone mind checking if I’m making sense here?

I think I get what regular fvar<double> is and functions of this.

I want to be clear on fvar<fvar<double>> though.

fvar<fvar<double>> (call it ffd) allows for up to two derivatives.
The elements are:
ffd.v.v – value of the variable
ffd.v.d – derivative of the variable with respect to something (call it a)
ffd.d.v – derivative of the variable with respect to something (call it b)
ffd.d.d – second derivatives

I think my biggest question is do ‘a’ and ‘b’ need to be the same thing? From looking around the code, I think not, and the fact that ffd.v.d need not equal ffd.d.v leads me to this conclusion as well.

So in math, if ‘v’ is my variable, then fvar<fvar<double>> just gives me access to:

v
dv/da
dv/db
d^2v/dadb

And if I pass v through a function ‘f’, then the fvar<fvar<double>> that gets spit out gives me access to:

f
df/da
df/db
d^2f/dadb

And if I’m using fvar<fvar> instead of double, then instead of these things being numbers I can call grad in the reverse mode autodiff and get the derivatives of all of these guys with respect to any of my parameters? (so if I have a parameter c, for instance):

With forward mode only all you can get out are directional derivatives, so you can associate each component with a certain inner product of a derivative with vectors. And at higher-order those inner products mix many different orders of derivatives making things very confusing.

For example, consider a many-to-one function, f : R^{N} -> R. Evaluating the function with a fvar<double> will require 2N inputs: the N function inputs, call them x, and the N sensitivities, call them u.

The fvar<double> returned by the evaluation will contain two values. The first value will be
the function value, f(x), which can be thought of as 0th-order derivative, if you will. The second value will be the inner product <u, df/dx(x)>.

For fvar<fvar<double>> you’ll need 4N inputs, the input x and three vectors which I’ll call u, w, and z. The evaluation of the function then yields

In particular, ffd.d.d braids together first and second-order derivatives. This is related to the fact that the compositional algebra of higher-order Taylor expansions is wacky.

Ha good stuff 10/10. Would read again. I dig the documentation around this place.

The reason I was looking at this is I wanted to know if casting an fvar to an fvar<fvar> made sense.

I think the answer is yes, and the first order value and gradient get copied while the second order value and gradient would just be set to zero.

–

If I’m reading this right, the magic of this is that if we just do math with these fvar<fvar<double>> things (using functions that have been expanded to work on dual numbers) then the different gradients we might need just show up in the appropriate places in the output (as defined by the rules that are enumerated in the tables)?

So to use this to get derivatives of some weird function I need, should I just look at the output equations in Table 1.1, pick the one that has the thing I need, and then just set the first value/first gradient/second value/second gradients of my inputs to achieve that? Is this the right way to think about this?

I still need to think through the adjoint stuff more.

Assuming that you mean fvar<double> into an fvar<fvar<double>> here. There’s no explicit casting mechanism, so you’d be relying on constructors. Looking at https://github.com/stan-dev/math/blob/develop/stan/math/fwd/core/fvar.hpp you can see that while there are constructors for a T argument, the sensitivity is always set to zero so I don’t think you’ll preserve all of the values correctly.

Basically. You just have to figure out how to decompose the gradient operator you want into the gradient inner products that forward mode gives you. For example, in the fvar<fvar<double>> I mentioned above, if you set z = 0 and u and w to all zeros except for the ith and jth elements respectively, then ffd.d.d will give you d^{2} f / dx_{i} dx_{j}. Do that N^{2} more times and you get yourself a full Hessian.

Note that reverse mode is much more efficient for many-to-one functions and we can compute the Hessian in only N sweeps. Again, see the Nomad manual, this time section 1.3.4.

Alright good stuff. Looks like there’s a special fvar<T> constructor that handles assigning fvar<double>s to fvar<fvar<double>>s and I think it makes sense.

@betanalpha: C-style casts (or even C+±style static casts) will call implicit constructors.

@bbbales2: In Stan, we will never ever cast anything up to an autodiff type other than a primitive int or double. You can see this by recursion because all of the parameters start as autodiff types and all the data as double or int. Every operation returns either a double (if arguments are all double or int) or the autodiff type. You never need to cast up the lower types. The constructor’s there because sometimes this happens internally because of the way some of the operations are written. But you have to be very very careful doing this because of the interpetation of the “casting”.

@Bob_Carpenter: But how do those implicit constructors work for fvar<fvar<double>>? Don’t you have to define how the casting works somehow or it is left ambiguous?

Makes sense. Which means I coded up the wrong tests in append array (appending vectors of fvar<var> with fvar<fvar<var>>). Overly verbose and the wrong thing! 2/2 right on target :P.

I can replace them with the simpler double/fvar<var>double/fvar<fvar<var>> versions when I work on the Stan append_array hooks. Makes sense why I was having trouble with the previous fvar stuff. No way something as simple as append_array tests shoulda been giving issues.