Operands and Partials: partials_ vs partials_vec_


#1

I’m working on the gradients for a function where one of the inputs has the signature std::vector<Eigen::Matrix<T, Eigen::Dynamic, 1> > (i.e. an array of vectors), and I’m having some trouble with operands_and_partials.

The gradients are entered as:

      operands_and_partials<T_theta, T_lam> ops_partials(theta, lambda);
        ops_partials.edge1_.partials_ = theta_deriv;
        ops_partials.edge2_.partials_ = lam_deriv;

Where lambda and lam_deriv are arrays of vectors.

The function and test runs fine when working with math/prim, but both fwd and rev fail with the error “no member named partials_

However, both fwd and rev work perfectly with partials_vec_:

        ops_partials.edge2_.partials_vec_ = lam_deriv;

This then causes the prim test to fail, because there’s no definition for partials_vec_ in the prim/scal version of operands_and_partials (but it’s mentioned in the doxygen, so it looks like it was there at some point?).

Is there a better way of entering the gradients in this case?


#2

hmm… have you found a solution to this?

I am trying to rewrite the multi_normal_cholesky using operands and partials and have a very similar issue. Things work if the input to the density are simple vectors, but passing in arrays of vectors causes the edge template specialization to be used to not anymore define a partials_, but it instead only defines a partials_vec_.

I wonder if there are multi-variate functions in stan-math which do use operands_and_partials in its current form. Maybe @seantalts knows of an example?


#3

Ah, the idea between partials vs partials_vec is that the latter is explicitly just for multivariate / containers of containers operands, meaning that the operands are either a container or a container of containers and you want to always view as a container of containers.

@Matthijs wrote some multivariate functions that use operands and partials in this way - check out some of his _glm functions. I can find the specifics when I’m back at a computer.


#4

@andrjohns - Is the idea that the function you’re working on will take std::vector<Eigen::Matrix<T, Eigen::Dynamic, 1> > and Eigen::Matrix<T, Eigen::Dynamic, 1> for the same argument, and you always want to treat it like std::vector<Eigen::Matrix<T, Eigen::Dynamic, 1> >? If so, I think partials_vec_ is intended for you, but I just looked and it seems like no one uses partials_vec_.

I think you might be right - there used to be a prim/mat/operands_and_partials that got refactored away at one point during the pull request. I think we might need it again for partials_vec_.

I think we might even need to move the prim/scal version into prim/mat (because it will need to refer to std::vector and Eigen), which might be kind of an annoying change. Hopefully soon (I will attack it in the next few weeks if no one else does) we can refactor away from this scal/arr/mat split and avoid issues like this.


#5

Yeah, I had similar issues with this. So the input can be Eigen vectors or arrays of Eigen vectors and the function should “just work”. I found a solution which could work in general. That is, I added a partials_vec_ to the basic ops_partials_edge like this:

template <typename ViewElt, typename Op>
class ops_partials_edge {
 public:
  typedef empty_broadcast_array<ViewElt, Op> partials_t;
  partials_t partials_;
  broadcast_array<partials_t> partials_vec_;

  ops_partials_edge() : partials_vec_(partials_) {}
  explicit ops_partials_edge(const Op&)
      : partials_vec_(partials_) {}
...

adding this and in addition extending the empty_broadcast fake definition in stan/math/prim/mat/meta/broadcast_array.hpp with a specialisation for a std::vector<Eigen::Matrix> makes things work for me as it looks like.

This should be consolidated at some point. The solution from @Matthijs to introduce assign functions is another possibility, but I am not sure if assign functions scale well.


#6

I think that’s the right idea, though shouldn’t it be empty_broadcast_array<partials_t> partials_vec_;

What’s the assign functions solution?


#7

Maybe it should be the empty thing, but I did not quite understand the need for the two template args of the empty broadcast thing (it has a ViewElt and a Op template parameters). So the broadcast_array appeared simpler to use for me and it did the job. By now I have a working implementation of multi_normal_cholesky with analytic gradients. It is about 2x faster than the built in version (exact timing depends on dimensionality; d=200 is ~2x; d=400 is ~2.7x; d=600 is ~3.3x). Right now the code works for rev only, but I should be able to extend this to fwd and mix (hopefully with the same speed gain).

Any suggestions on how to proceed with getting this into stan-math? I guess this will a two-step pull which first adds the operands_and_partials stuff to make it work with nested multi-variate containers and then the actual refactored multi normal; or would it make sense to have a branch up on stan-math where you can look at all at once (maybe things can be done even better).

A faster multi normal will be so useful (I find it inconvenient to rely on reparametrizations which represent the density via std-normals).

@Matthijs did work around changing types with functions, see here.


#8

Oh yeah, since you could have code that indexes into it twice, broadcast_array is better. Huh.

I think it’d be fine to include the operands_and_partials stuff in your existing PR, but if you want it in faster a separate PR just for that is probably better.


#9

Ok, I created a new branch for the operands_and_partials. Things are a bit more involved as it looks. That is, the size tests which you wrote did fail with the new things in place. The object is now a bit bigger (80 instead of 5). I am not sure if this is a problem or not; I guess this is performance related? Note that I commented out the size tests to confirm that everything else works in order.

I also pushed the multi-variate normal with analytic gradients to stan-math. This branch includes the changes to operands_and_partials and the rewritten density.


#10

@wds15 Thanks for sorting this one out, it makes one of my projects significantly easier. Much appreciated!


#11

You are welcome!

I was also stuck with the old code as I was adding analytic gradients for the multi_normal_cholesky. Now both changes are in math.

Sounds like you are doing interesting projects…