ODE refactor now has a path forward

Hi!

I think I need to revise my suggestion of the jacobian_ode function as the current design would need to modify the AD stack during ODE integration even if a user supplies analytic Jacobians if I am not wrong. So here is what would happen now:

In the ode_rhs_sensitivity operator() function Jy would be obtained by doing something like
// use Jy
vector Jy(N_ * N_);
const vector y_v(y, y + N_);
jacobian_ode(rhs_, y_v, theta_d_, ydot, Jy.begin());

The problem is that creating y_v from y (which is an iterator into a double only vector) will inadvertly access and modify the AD stack which I really do not want to happen for the reason that this would make trivial parallelization very hard. So to get that the ODE integration is a double only operation whenever the user supplies an analytic Jacobian we may not rely on function overloading. Moreover, we wanted to mark these jacobians with a type which express that they are analytic or use AD, so we should probably use

template
struct jacobian_ode_initials {

typedef boost::true_type use_AD;

void operator()(const ode_rhs rhs&,
const vector& y_d,
const vector& theta_d,
vector::iterator ydot,
vector::iterator Jy) {
// use AD to calculate Jy and stream out to Jy. Will write NxN
// elements (N=length of y_d vector)
}
};

// this one corresponds to Jtheta
template
struct jacobian_ode_params {

typedef boost::true_type use_AD;

void operator()(const ode_rhs&,
const vector< double >& y_d,
const vector& theta_d,
vector::iterator ydot,
vector::iterator Jtheta) {
// use AD to calculate Jtheta
}
};

This version has double only arguments. The access to the AD stack will only ever happen inside these functions and should a user specify these functions to provide analytic Jacobians, then the ODE integration will not touch the AD stack. I hope that makes sense, if so I will update the dummy code on the wiki.

One more question wrt to what is to come: In the future we would certainly like that users can specify the Jacobians inside the Stan language. For that purpose is it OK to make both jacobian_ode_* a functor object? Anything else to consider in this regard?

Best,
Sebastian

Hi Sebastian,

I’m lost again. Here’s what I think it should look like:

class ode_rhs;

template <class ode_rhs, class T1, class T2>
class Jy {
  public:
    std::vector<double> operator()(const double t, // initial time
             const std::vector<T1>& y, //initial positions
             const std::vector<T2>& parms, // parameters
             const std::vector<double>& sx, // double data
             const std::vector<int>& sx_int);
};

That should be the same interface as Jtheta. I don’t know if T1 and T2 both have to be templated, but my guess is yes.

I don’t understand in your example why you’re passing in a rhs object. That should be taken care of on construction. I don’t think using iterators in that way is natural.

I don’t think putting flags in is a good idea. It’s
a typical anti-pattern in OO and templated programming because
rather than using the OO and templating mechanisms in the languages,
it puts in a check based on the flag.

You want the implementation to be supplied by the class, but how it does it
should be encapsulated if at all possible to the outside.
That’s why I was suggesting using a templated concept—it avoid
virtual function call overhead and allows differnet implementations
to be plugged and played.

  • Bob

Hi Bob!

Flag is the evil word for it and is not OO like, yes. I would call it a trait - and then this is a modern OO cool thing :slight_smile: .

The idea was to mark Jacobian implementations if they use AD or not. Then you can decide at compile time if using some parallelization would be appropriate. At least this was the idea.

Best,
Sebastian

Hi Daniel!

My thoughts on your suggestion:

  • What really scares me performance wise is the copying of Jy as a vector from the function. This copying will happen very often and cost some performance for sure. See below for a possible solution.

  • I think T1 and T2 should always be doubles.

  • The double and int data should be made members of ode_rhs. Then ode_rhs can take care of feeding that into the functor.

  • I still prefer to call the class jacobian_ode_states, but name variables which get the output of this as Jy.

To avoid many copies of the Jacobian floating around and to make good use of cache locality how about this:

template< class ode_rhs>
class jacobian_ode_states {
private:
Eigen::MatrixXd Jy_;
const ode_rhs& rhs_;

public:
update(const double t, const std::vector< double >& y, const std::vector< double > params ) {
// use AD to calculate Jy and populate Jy_;
}
const Eigen::MatrixXd& value() { return Jy_; }
};

Doing so ensures that storing Jy is inside this object and we can use Jy in a read-only manner by querying the get function. This avoids reallocation of Jy storage such that it will stay at the same location in the cache and moreover we do not need to make copies of Jy to use it.

BTW… how can I format code so nicely as you did??

Best,
Sebastian

Wouldn’t the parallelization be internal to those implementations?
What I don’t like is having flags on classes that are used externally
in conditionals to decide how to use the classes. It breaks their
encapsulation.

  • Bob

wds15 Developer
October 5
Hi Daniel!

My thoughts on your suggestion:

• What really scares me performance wise is the copying of Jy as a vector from the function. This copying will happen very often and cost some performance for sure. See below for a possible solution.

The compiler can reduce a lot of this copying on return,
but it is a concern. It’ll be better with the move semantics
of C++11, because all we really need to do is use the pointer
to the heap inside of the Eigen::Matrix.

• I think T1 and T2 should always be doubles.

• The double and int data should be made members of ode_rhs. Then ode_rhs can take care of feeding that into the functor.

Does this “data” ever change?

• I still prefer to call the class jacobian_ode_states, but name variables which get the output of this as Jy.

OK by me.

To avoid many copies of the Jacobian floating around and to make good use of cache locality how about this:

template
class jacobian_ode_states {
private:
Eigen::MatrixXd Jy_;
const ode_rhs& rhs_;

public:
update(const double t, const std::vector< double >& y, const std::vector< double > params ) {
// use AD to calculate Jy and populate Jy_;
}
const Eigen::MatrixXd& value() { return Jy_; }
};

I don’t see how that helps with caching, the goal of which is to
stream data in from memory to the CPU in a local way so that you
don’t have to keep paging in big blocks. An Eigen::Matrix just
stores its data on the heap and the locality will be determined by
whether it’s row major or column major.

Doing so ensures that storing Jy is inside this object and we can use Jy in a read-only manner by querying the get function.

We can just return a const reference to the matrix itself, too.

This avoids reallocation of Jy storage such that it will stay at the same location in the cache

That’s not how caching works. And I think you may be worrying more
about static addressing vs. dynamic (is the memory location known at
compile time). And I’m afraid you’re stuck with Eigen::Matrix and
std::vector because they both point into the heap rather than allocating
memory as part of the object, so it’s always dynamic.

and moreover we do not need to make copies of Jy to use it.

You just have to be careful to keep the object with the handle on Jy
consistent.

BTW… how can I format code so nicely as you did??

I want to know that, too.

  • Bob

Hi!

Ah, then how about this version which eliminates the get function:

Yeah, C++11 with its move feature will make things a lot more straightforward - but should we use this already? The above code would a reasonable compromise, I find.

To your question about the data: Yes, the double and int data never change of the ode_rhs object, which is why I think it is a good thing to throw it into the ode_rhs object.

A parallelization (as I have it in mind) could use all of the objects we introduce here as long as AD is not used which requires to provide analytic Jacobians. The intent was to produce a compile time failure if this is attempted for a case without analytic Jacobians. However, this is somewhat down the road and maybe we should not worry about this too much now and just get the basic right (which to me is that AD can be optional).

One last thing which I consider worthwhile to do is to pass into the jacobian function a reference to a vector storing ydot which is the result of the ode rhs. ydot will anyway get calculated whenever the Jacobian is evaluated, so not storing the result is a waste, no?

Sebastian

Hi Sebastian,

Just wanted to mention something generally: let’s try to design this with the cleanest interface possible. Worry about hypothetical and real optimization issues later. Please. Especially with this stage where we’re trying to just get the design right at a high level. Once we implement, the design may change to something that’s faster, but let’s keep this discussion to the best possible design we can put together.

I’m having trouble keeping requirements separated from optimizations.

Daniel

Hi Daniel,

sure, let’s do that first.

What do you think about using Eigen MatrixXd as return from the jacobian functions?

Sebastian

That sounds great!

Regarding code blocks: triple back ticks.

```
int main() {
}
```

Shows up as:

int main() {
}

Yes, you can do this, but it will mean that your returned
value has only the lifespan of the class that is managing the
memory until you make a copy.

  • Bob

Yes, I think matrix structures should be MatrixXd.
The vector<vector > approach is really inefficient.
And we don’t want to return bare arrays.

  • Bob

@wds15, I’ve been working through the code again. I think this is what we need:

  1. a structure for the jacobian with respect to states
  2. a structure for the jacobian with respect to parameters
  3. a structure for the jacobian with respect to states and parameters (I’ve looked at the code and I’m now convinced that there are gains to be made)
  4. a structure to act as an adapter to odeint
  5. a structure to act as an adapter to CVODES

For 1, 2, and 3, they should look like the C++ functor that’s generated:

<template F>
struct ode_jacobian_states {
  Eigen::MatrixXd operator()(double t,
                                             const std::vector<var>& y,
                                             const std::vector<double>& theta,
                                             const std::vector<double>& x,
                                             const std::vector<int>& x_int,
                                             std::ostream* pstream__) { ... }
};

<template F>
struct ode_jacobian_parameters {
  Eigen::MatrixXd operator()(double t,
                                             const std::vector<double>& y,
                                             const std::vector<var>& theta,
                                             const std::vector<double>& x,
                                             const std::vector<int>& x_int,
                                             std::ostream* pstream__) { ... }
};

<template F>
struct ode_jacobian_states_and_parameters {
  Eigen::MatrixXd operator()(double t,
                                             const std::vector<var>& y,
                                             const std::vector<var>& theta,
                                             const std::vector<double>& x,
                                             const std::vector<int>& x_int,
                                             std::ostream* pstream__) { ... }
};

This is what we need to provide Boost:

struct odeint_ode_system {
  void operator()(const std::vector<double>& y,
                  std::vector<double>& dy_dt,
                  double t) {
  }
};

And this is what we need to provide CVODES (I think we could rename the functions so they’re clearer):

struct cvodes_ode_system {
  static int ode_rhs(double t, N_Vector y, N_Vector ydot, void* user_data) { ... }

  static int ode_rhs_sens(int Ns, realtype t,
                          N_Vector y, N_Vector ydot,
                          N_Vector *yS, N_Vector *ySdot, void *user_data,
                          N_Vector tmp1, N_Vector tmp2) { ... }

  static int dense_jacobian(long int N, 
                            realtype t, N_Vector y, N_Vector fy,
                            DlsMat J, void *user_data,
                            N_Vector tmp1, N_Vector tmp2, N_Vector tmp3) { ... }  
};

@wds15, this right so far?

Hi Daniel!

From your list, we do not need item (2). Whenver the parameter Jacobian is needed, you always need the jacobian wrt to states as well. There is no case whenever only the parameter Jacobian is needed.

A few questions:

  • What motivates you to go from jacobian_ode_* to ode_jacobian_*?

  • Originally we had the idea to introduce a ode_rhs object as I understood. This would have the advantage that we can place references to x, x_int and pstream__ there - and theta as well. As a result the signature of ode_jacobian_* won’t need to carry these arguments anymore.

  • If we want to keep the ODE system free from manipulations of the AD stack for the case of user-supplied jacobians, then we should not have any var references as part of the function arguments. So there may not be a std::vector<var> argument to these functions. Putting things on the AD stack has to happen inside the jacobian functions. At least I think that this is necessary to warrant that we get really AD free ODE integration should the user be so kind and specify the Jacobians. This is important for hassle free parallelization of the ODE code.

  • How about we align the jacobian functions with the math/rev/mat/functor/jacobian.hpp interface to some degree? That would mean to have it like

<template F>
struct ode_jacobian_states {
  void operator()(double t,
                          const std::vector<double>& y,
                          const std::vector<double>& theta,
                          const std::vector<double>& x,
                          const std::vector<int>& x_int,
                          std::ostream* pstream__,
                           std::vector<double>& fy,
                          Eigen::MatrixXd& Jy) { ... }
};

OR if we go with an ode_rhs object, we have

<template F>
struct ode_jacobian_states {
  void operator()(const ode_rhs<F>&,
                          double t,
                          const std::vector<double>& y,
                           std::vector<double>& fy,
                          Eigen::MatrixXd& Jy) { ... }
};

  • Should you not want to introduce an ode_rhs function, then we could throw the arguments to f which stay constant throughout into the jacobian structs themselves, so we end up with
<template F>
struct ode_jacobian_states {
  const F& f_;
  const std::vector<double>& theta_;
  //... same for x, x_int and pstream ...

  ode_jacobian_states(const F& f, const std::vector<double>& theta, x, x_int, pstream)
  void operator()(double t,
                          const std::vector<double>& y,
                           std::vector<double>& fy,
                          Eigen::MatrixXd& Jy) { ... }
};

  • Where do you want to place the decoupling operation? Note that the output generated from each integrator can easily be put into a format such that the decoupling operation is not any more integrator specific. Hence, it is easy to design the interfaces to the integrators such that the double output is setup in way such that the decoupling operation is always the same. Where do we put this?

Good to move this forward!

Best,
Sebastian

Hi Sebastian,

That’s not correct. We have 4 cases that can be generated from Stan. Each theta and y0 can be {data, parameters}.

I think we can just define 1 and 2 and get 3 without needing to implement something different, but it might be too slow. This might be something we test empirically.

Note: if we only define 1 and 2, then it’s much easier to implement custom jacobians.

Wasn’t thinking? But maybe it’s cause the first thing I think about is that it’s first related to ode then the jacobian.

No wonder I’ve been so confused in the past!

No.

I don’t know why we’d introduce another structure for that. There’s no need for that extra level of indirection unless it gains us something. Here, it doesn’t as far as I can tell.

I’ll start with: “we should not have any var references as part of the function arguments.”

I think you’re right. When I was typing it up, I was basing it on what I saw in our current implementation. But I think you’re right.

Can we first design the interface into something clean and natural?

I would really like to push off optimization until later. We’re already dealing with so many aspects and I’m trying really hard to get the design of all the pieces right; having the design change based on optimization and making it look unnatural is really distracting to me. It might not bother you so much, but it’s really slowing me down in keeping up with what we’re talking about. I want us to have something that we can maintain, extend, and is maximally clear first. Then we can change everything to things that are fast. I’m not sure this will matter so much, but I could be wrong. If it doesn’t matter at all, then I don’t want to do what you’re suggesting. If it does matter, we’ll shift that direction after we determine that it does matter.

I don’t see this the same way you do. I think it belongs in the adapters and it’s not general. There’s one operation for boost and one for CVODES and they’re not the same. Let’s get the interfaces down and then worry about this. I don’t see the adapters having this function exposed anymore.

Hi Daniel!

I am fine with going from jacobian_ode_* to ode_jacobian_* - just wanted to understand your logic here.

We really do never need the case of only calculating the parameter sensitivities. If you go through the math, then you see that whenever any sensitivities are calculated, then you need Jy. You need Jtheta on top of that if the sensitivities of the parameters are needed. I think I doced this on the wiki.

My thought to write fy out from the jacobian functions was driven by getting it consistent with the jacobian function we have (and, I commit, also because its more efficient) - but I will try to stop optimizations thoughts as good as I can.

Now, I do not quite see the need to make the adapters handle the decoupling operation in isolation. To my understanding, what is specific to the integrators is how the integrators output the data of the solutions (odeint has a functor for this, cvodes lets you do a loop and then you read from those structures). However, the output can be written into a simple standardized form so that the output from any integrator can be decoupled in a generic way.

Moreover, when you look through what odeint and cvodes needs as input, it does make sense to create a generic adapter class or template (not sure) which then needs only a very lightweight adapter to the specific integrator.

If I understand you right, then you intend to build per integrator 4 adapters - one for each case we have. This can be avoided by writing a generic adapter which is specialized to the 4 cases we care about. Doing so allows you write the integrator adaptor in a way agnostic to the 4 cases, because the 4 different cases can be managed by the general thing - this was my motivation to define a ode_rhs_sensitivity object.

However, I may miss points which you have in mind to do, so please go ahead and I comment on it. Maybe that is more efficient - I have the impression to cause a lot of confusion and I should probably just review ( as long as we are in the design stage ).

Best,
Sebastian

No. See: https://github.com/stan-dev/math/blob/develop/stan/math/rev/arr/functor/coupled_ode_system.hpp#L62

Each of the 3 functions are different. You can convince me that it’s not necessary, but it’s clearly different in code.

THANK YOU!

No. One adapter for odeint and one adapter for CVODES. (There might be four separate template specializations underneath, but I’m not sure… we’re trying to get the design right now.)

Hi Daniel!

Have a look at what I wrote here:

Basically, the sensitivity states are crammed into a matrix S. Stheta is a NxM matrix, i.e. per parameter theta_m for which we take the sensitivity there is a column of N rows in that matrix (one per state).

In that notation the sensitivity ODE RHS is given by

  • for the initials as
    dSy0/dt = Jy * Sy0

  • for the parameters as
    dStheta/dt = Jy * Stheta + Jtheta

That means for the 3 cases with any sensitivites we endup with:

  1. v,d = varying initials, fixed parameters

    • states (y,Sy0)
    • Jy needed only
  2. d,v = fixed initials, varying parameters

  • states (y, Stheta)
  • Jy AND Jtheta needed
  1. v,v = both are varying
  • states (y, Sy0, Stheta)
  • Jy AND Jtheta needed

The code you refer to buries the math behind this - while I think it would be good to take advantage of the nice structure of matrix-vector products which will easy readability of the code a lot (at least to me).

I was saying that we have one adapter per integrator, yes. However, I do think that we should design things such that we do not need 4 different specializations underneath for each adapter. I think that can be done and it would make maintenance easier in going forward.

Best,
Sebastian

In your doc, what’s p?