# Precomputed Jacobians or vector-Jacobian products?

I had previously discovered `precomputed_gradients` for a function such as `real foo(vector theta);`, and now I was looking to generalize to more complex arguments `vector theta, real a, matrix b`, and if I understand the corresponding test correctly, `precomputed_gradients(val, _, _, ops, grads)` can be used for that. However, what to do if the output is not scalar, e.g. `vector foo(vector theta)` or `matrix bar(vector theta)`? For instance, the inner time-stepping loop of a SDE model calls a function such as

``````vector diff_coupling(matrix weights, vector x) {
vector[rows(x)] out = rep_vector(0, rows(x));
for (i in 1:rows(x))
out[i] = weights[, i]' * (x - x[i]);
return out;
}
``````

which generates I guess at least `rows(x)` AD graph nodes, while the vector-Jacobian product for the `vector x` op is just a vector-matrix multiply. From reading the Stan Math paper, it seems the only choice is then to implement a `vari.chain()` with the vector-Jacobian product, and then instantiate it in a function in the model header. Are there any shortcuts like `precomputed_gradients` for this or this is how it works?

A related, less concrete question is whether it is also worth coding these partials by hand for time-stepping models, to improve performance (especially when there are many time steps). Bugs aside, I would assume so, but then I went looking at the source for `gaussian_dlm_obs_log` and saw itâs using AD as well. My guess is this is Ok since the time spent in Eigen vector and matrix arithmetic dominates pointer chasing time in AD (or the C++ compiler optimizes all that away?), but it would be nice to know if this guess is off and how to know when itâs worth reimplementing in C++ with AD vs reimplementing in C++ with partials done by hand. Thanks in advance for any advice.

1 Like

There are better alternatives for that. Look for the reverse pass callback facility in Stan math. The softmax function uses it, so head there. This is very new stuff and others may give you more help to get started.

1 Like

I agree with @wds15. Use the `revere_pass_callback`. There is a blog post by @stevebronder about this: https://blog.mc-stan.org/2020/11/23/thinking-about-automatic-differentiation-in-fun-new-ways/

Take a look at softmax or inverse for functions with one argument, though in your case you will have more than one argument so maybe take a look at multiply.

For Stan Math functions with more arguments we need to cover all cases [(arg1 data, arg2 var), (arg1 var, arg2 data), (both var)]. If you are writing a custom c++ function you have a pretty good idea which arguments are going to be var so no need to cover other cases as well.

1 Like

@maedoc yeah the doc isnât ready yet for this stuff yet. Youâll want to use the develop version of Math if youâre going to work on this stuff, cause it changes rapidly (for the better).

The API doc is here and itâs updated to develop I believe: https://mc-stan.org/math/. So any time youâre curious about a weird looking function, check there. Thatâll get you the doxygen in the code.

Hereâs something like your function. I didnât check it or finish it, but you can start here:

``````template <typename T1, typename T2,
require_matrix_t<T1>* = nullptr,
require_col_vector_t<T2>* = nullptr,
require_any_st_var<T1, T2>* = nullptr>
auto myfunc(const T1& m, const T2& v) {
using inner_ret_type = Eigen::VectorXd;
using ret_type = return_var_matrix_t<Eigen::VectorXd, Mat1, Mat2>;
if (!is_constant<T1>::value && !is_constant<T2>::value) {
arena_t<promote_scalar_t<var, T1>> arena_m = m;
arena_t<promote_scalar_t<var, T2>> arena_v = v;
arena_t<ret_type> ret = arena_m.val() * arena_v.val() - (arena_v.val().array() * arena_m.val().rowwise().sum().array()).matrix();
reverse_pass_callback([ret, arena_m, arena_v]() mutable {
});
return ret_type(ret);
} else if (!is_constant<T1>::value) {
arena_t<promote_scalar_t<var, T1>> arena_m = m;
arena_t<promote_scalar_t<double, T2>> arena_v = v;
arena_t<ret_type> ret = arena_m.val() * arena_v - (arena_v.array() * arena_m.val().rowwise().sum().array()).matrix();
reverse_pass_callback([ret, arena_m, arena_v]() mutable {
});
return ret_type(ret);
} else if (!is_constant<T2>::value) {
arena_t<promote_scalar_t<double, T1>> arena_m = m;
arena_t<promote_scalar_t<var, T2>> arena_v = v;
arena_t<ret_type> ret = arena_m * arena_v.val() - (arena_v.val().array() * arena_m.rowwise().sum().array()).matrix();
reverse_pass_callback([ret, arena_m, arena_v]() mutable {
});
return ret_type(ret);
}
}
``````
1. First new thing are the requires. Instead of typing types explicitly we use C++ template SFINAE sorta stuff to accept arguments. This is cause for any argument there are a ton of different types that work. The first here says, âMake sure the first argument is a matrixâ. The second says âMake sure the second is a column vectorâ. The last says, âMake sure at least one is an autodiff type (has scalar type var)â. Search for these in the upper right of https://mc-stan.org/math/

2. `return_var_matrix_t` â there are actually two types of autodiff matrices in Stan now (design doc for the second one here), and this picks the right return type given the types of the input autodiff variables.

3. There are a lot of `arena_t<T>` expression in the code. `arena_t<T>` says âgive me a variable equivalent to the type `T` that is stored in the autodiff arenaâ. This is used to save variables in the forward pass that are needed in the reverse pass.

4. Because your function as two arguments and requires at least one argument to be an autodiff type, then you need to handle the three combinations of double/autodiff types. Thatâs what the `if/else` stuff with `is_constant` is handling. (`is_constant<T1>::value == TRUE` means this is not an autodiff type).

5. `reverse_pass_callback` takes a lambda argument. This function will be called in the reverse pass. Itâs the equivalent of chain, but thanks to lambda captures itâs way easier to write. Capture everything by copying. `arena_t` types are cheap to copy by design. The code inside this is responsible for incrementing the adjoints of the input variables (which you saved copies of them in `arena_m` and `arena_v`. The `mutable` thing is required on the lambda so that you can write the adjoints (otherwise youâll get an error about the adjoints being `const`).

6. There are a lot of accessors `.val()` and `.val_op()` and `.adj()` and `.adj_op()` that are Stan extensions to Eigen types (so you wonât find them in the EIgen docs). `val()` and `adj()` give you the values and adjoints of a Stan matrix type. Sometimes with multiplications youâll get compile errors, and in that case use `.val_op()` and `.adj_op()` (Iâm not sure the difference and hopefully this gets cleaned up cuz itâs confusing)

7. How we test these things is the `expect_ad` function in test/unit/math/mix. The assumption of `expect_ad` is that youâve written a version of your function that works with doubles that you trust. It compares finite difference gradients to the ones computed with autodiff up through third derivatives I think. The tests for `elt_multiply` (elementwise multiplication) are here, look at them as an example. The `expect_ad_matvar` tests the new matrix autodiff type.

8. Hereâs a recently completed new function as an example of all this

I would start with a one argument version of your function and work from there. Just set the vector or the matrix equal to a constant inside the function. The one argument versions of this are much simpler. The multiple argument thing adds some annoying complexity (that isnât so bad once you know what it is but can be pretty annoying to work through).

4 Likes

Thanks for the replies! marking @bbbales2 answer as solution as it anticipates several other questions in mind after having grepped around in the develop branch a bit. Teh blog post makes it look quite easy but I see why the arena is important for container types. I will follow up with this function, work through some multiargument cases and see if additional confusions come up.

1 Like

A couple years ago I did some investigation (no actual coding/benchmarking ) into if it could be worthwhile hand-coding the partials of state space models and drew the conclusion that for time invariant matrices there exists tricks that potentially could result in better performance (see e.g. https://doi.org/10.1007/s00180-012-0352-y). But that for time-varying matrices (which was what I needed), AD is probably just as performant as hand-coded partials.

6 Likes

Oh yeah be careful not to return an `arena_t` (this can lead to silent errors). We should add tests to `test_ad` to make sure that doesnât happen, but for now you need to check that manually.

1 Like

It took me a while to try this out, but for a nonlinear state space model (174 ODEs with Heun scheme), a C++ version of the ODEs+Heun scheme is 30-50% faster than plain Stan, but in my case, Iâm not building the whole matrices at every time step but doing VJPs through the ODEs themselves which may be more compact memory-wise ÂŻ_(ă)_/ÂŻ

3 Likes