Sequence view design


#1

I think this should all go on discourse, so I’m posting Sean’s query here.

… I have a design for vector_seq_view that I’d let to get your feedback on. It’s here: https://github.com/stan-dev/math/commit/6bada4147c24aa8b7c1147423b177f8974fe1394

What this code does feels potentially a little dirty; it defines a base template class (is that the right phrase?) vector_seq_view that defaults to returning the singular item it was passed in for any type T Then there is a specialization just for std::vector<T> that will actually index into the std::vector on operator[]. From reading the docs for the multivariate distributions, it seems like they only need to handle this case.

What do you think? Is this “cheating” somehow?


#2

It’s not cheating—anything’s fair game in C++! And yes, “template base class” is the right terminology.

I’d prefer to see something that’s concretely specified for containing Eigen objects. Here’s what I was thinking for vectors:

template <typename C>
vector_seq_view { };

template <typename T>
vector_seq_view<Eigen::Matrix<T, -1, 1> > {
  vector_seq_view(const Eigen::Matrix<T, -1, 1>&);
  Eigen::Matrix<T, -1, 1> operator[](int);
};

template <typename T>
vector_seq_view<std::vector<Eigen::Matrix<T, -1, 1> > > {
  vector_seq_view(const std::vector<Eigen::Matrix<T, -1, 1> >&);
  Eigen::Matrix<T, -1, 1> operator[](int);
};

It just makes the usage much more transparent from the client perspective. And it can’t be accidentally specified (what I’m guessing you meant by “cheating”). Now we could generalize to other shapes of Eigen matrices easily enough and still stupport the above API, though now with a new less transparent name:

template <typename C>
matrix_seq_view { };

template <typename T, int R, int C>
matrix_seq_view<Eigen::Matrix<T, R, C> > {
  matrix_seq_view(const Eigen::Matrix<T, R, C>&);
  Eigen::Matrix<T, R, C> operator[](int);
};

template <typename T>
matrix_seq_view<std::vector<Eigen::Matrix<T, R, C> > > {
  matrix_seq_view(const std::vector<Eigen::Matrix<T, R, C> >&);
  Eigen::Matrix<T, R, C> operator[](int);
};

I do not see how to easily (key is clarity here) combine this with the scalar sequence views. You’d need to specify the scalar vs. matrix type somewhere in order to resolve how to construct such a view. And the scalar form has this odd generalization to vectors and row vectors that this version doesn’t have (it could, but we don’t ever instantiate Eigen objects with types other than scalars).


#3

Hm, you’d rather see something specific to Eigen types because it could be confusing if people were allowed to lift just any old type into the std::vector monad? This is interesting; I think in Haskell the principle is something like “write the most generic function you can, and narrow down the types later if you need to.” But here you think it helps with documentation on the main intended usage? I could see test cases serving a similar purpose with a more generic implementation… I’m not sure how I feel about it; std::vector seems somewhat arbitrary as a collection type but on the other hand it might be the most reasonable form of generalized “multivariate” we have in the code base (given that all our multivariate distributions use std::vectors of Eigen vectors).


#4

If we were in a typed functional language with homogeneous sequences, then that approach would unquestionably be the right one. But we’re not quite in that environment.

The problem we have is with return types. With a single polymorphic seq_view, things would have to look like this:

/** adapts container type C to be a sequence of Ts */
template <typename C, typename T>
class seq_view;

vector<double> xs(...);
seq_view<vector<double>, double> v1(xs);
seq_view<vector<double>, vector<double> > v2(xs);

Then we know that v1[n] is type double and v2[n] is vector<double>.

We also need to implement Eigen vector containers, to allow:

VectorXd ys(...);
seq_view<VectorXd, double> u1(ys);
seq_view<VectorXd, VectorXd> u2(ys);

and now u1[n] is type double and u2[n] is of type VectorXd.

But now we don’t know if we have a single VectorXd if we want a u1 type thing (sequence of double) or we want a u2 (sequence of VectorXd). We need the former for the mean in univariate normal and the latter for the mean in multivariate normal.

What we have now looks like this:

scalar_seq_view<double> w1(xs);  // double operator[](int)
scalar_seq_view<vector<double> > w2(xs);  // double operator[](int)
scalar_seq_view<VectorXd> w3(xs);   // double operator[](int)

VectorXd zs;
vector_seq_view<VectorXd> b1(zs);  // VectorXd operator[](int)

vector<VectorXd> as;
vector_seq_view<vector<VectorXd> > b2(as);  // VectorXd operator[](int)

It moves the specialization from the template system to the name of the function and reduces an awkward template parameter.

The reason the explicit typing of the return type (as in the first option) is awkward compared to the one with it built into the name is that it is sometimes double and sometimes var and sometimes fvar<T>. But in our use case, we invariably only have a template parameter for the container type and then have to put the scalar type back together using template metaprogramming.

I don’t think either approach is awful, but I think the one with names is easier to understand from a client perspective, easier to use in our primary use case (no template metaprogramming) and easier to code (much simpler template specializations, and while there’s a bit more code and doc, it’s simpler code and simpler doc).


#5

Ah yeah, sorry I was maybe confusing in my question. I’m on board with the API / use cases as presented in the last code sample (“What we have now looks like this:”), and I like the tradeoff that example makes w.r.t. name of the function vs. awkward template parameter. There’s definitely an elegance to the mental model of a template class that takes two types and finds any of the second inside the first for iteration, but I think there’s too many corner cases and confusing implementation details involved for that to be the path forward.

The executive summary of the slightly unwieldy text below might be “Why concretely specialize vector_seq_view for Eigen objects?”).

In my head my question was more about the rationale behind the different implementations of vector_seq_view discussed two posts up. One way to look at it is that scalar_seq_view focuses on diving into whatever you give it and iterating over any scalars present (I don’t think the implementation of scalar_type allows us to quite live up to this view in all cases, but it does for all of our use cases). That way of looking at scalar_seq_view sort of implies that perhaps we want another class, vector_seq_view, that dives into whatever containers you give it to serve up a sequence of Eigen vectors. That view also implies something similar to scalar_type (vector_type?) that finds the nested Vector or RowVector in a type parameter and uses that as the return type.

Another way to look the situation is that we need scalar_seq_view as above, but we could also use something that’s only a little bit related to it that allows us to automatically lift something into a container type, if it isn’t in that type already (in Haskell this would have a signature like Either a [a] -> [a]). I think this requires picking some set of container types (here, just std::vector) we will assume are “already lifted” and so we will just index through those as is without adding any additional layers, and then for all other types we pretend to wrap them (by having operator[] always return the single element).

It seems like the code you have two posts up with vector_seq_view only implemented for Matrix and vector<Matrix> is maybe adhering to the first paradigm I described while adhering to the YAGNI principle and not implementing any cases besides those which we definitely use. Is that close?

And so I’m wondering if the Either a [a] -> [a] paradigm has any place in our world? To me it feels like a more general abstraction that could be more broadly useful, but you know the lay of the land better than I.


#6

I hadn’t seen “YAGNI” before, and that’s partly it, but it’s also the concreteness and safety of not allowing construction in ways that aren’t intended. You had something like this:

template <typename T>
struct seq_view {
  const T& x_;
  seq_view(const T& x) : x_(x) { }
  const T& operator[](int n) { return x_; }
}

template <typename T>
struct seq_view<vector<T> > {
  const vector<T>& xs_;
  seq_view(const vector<T>& x) : xs_(x) { }
  const T& operator[](int n) { return xs_[n]; }
}

That would work for all the views of Eigen vectors that we currently have and all the views of Eigen matrices we are currently thinking about.

I was suggesting two more specific versions that would have the same effect as instantiating T to Eigen::Matrix<S, -1, 1> and Eigen::Matrix<S, -1, 1> but would more clearly specify the return type. Mainly because I think it’ll be cleaner in the code to use vector_seq_view and matrix_seq_view to clearly indicate what the return type of operator[](int) is going to be.

The automatic lifting in the general case is problematic, because we can have std::vector<double> wrapped to give a scalar sequence and also need to wrap VectorXd, etc. to provide scalar sequence views. Where it won’t work is if we have sequence views of std::vector<T>, but then what I proposed won’t work there either.

It’s funny—I think about these problems all the time, but don’t think about them much in these functional terms (like generic type lifting) very much, whereas I used to think about type lifting all the time (mainly in the map sense of lifting a functor on scalars up to a functor on sequences, which inverts control relative to what we’re talking about here).

I think what you’re suggesting will work for everythign we need, and if you don’t think it’ll be too confusing in client code (by which I mean things like multi_normal_lpdf) then go ahead and do it that way.

I don’t know the Either construction, but see it’s a variant type spec. We use that all over the grammars. It raises control as well being implemented with a functor that operates on each of the variants. But the problem here in that language is that while this

Either T vector<T> -> T

will work for cases where T is an Eigen container, what we need for cases where T is a scalar look like:

Either double 
       vector<double>
       Eigen::Matrix<double, -1, 1>
       Eigen::Matrix<double, 1, -1>
  -> double

and same for var and fvar<T> in place of double. That is, our containers in this pattern aren’t homogeneous in their usage. This is what I was trying to get at before. But it’s only the scalar vs. matrix views that matter.

To end a long story, go ahead and implement this however you think is best. I think it’ll work either way. Just because I’d do it one way doesn’t mean you have to!


#7

Okay. I think I’ve got a handle on the difficulties in unifying scalar_seq_view with vector_seq_view due to what you’re saying above, was mostly trying to discuss the tradeoffs of solutions to the vector_seq_view use case.

I like that there’s some compiler checked mechanism for showing what the intended use cases are for your version. When you do write a really general function like Either a [a] -> [a] in e.g. Haskell usually there are IDE (or REPL) facilities for understanding what types will actually pop out in your use case, which makes it actually usable. in C++ land, I don’t think we have anything like that, so it seems best to do as you’re saying and be explicit. It’s too bad there’s no way we can do the same type-as-documentation thing for scalar_seq_view; seems like that would require var, double, int, etc. sharing a supertype. At least the names will line up this way though and that’s pretty nice.


#8

Yes, it’ll definitely only unify what I was calling vector_seq_view and matrix_seq_view (and also row_vector_seq_view if we ever need that).

In the Boost variant types that we use all over the parser, C++ works just the way you want it to from a type-theoretic perspective. Let’s say you want to define a variant type A | B | C. You can construct an instance with an A, a B or a C object. Now if you want to define a function f:(A | B | C) -> D, then what you do is define a functor which is able to apply to an A, a B, or a C. In signature terms, that means it must implement the three methods defined below for foo.

struct foo {
  D operator()(A a) const;
  D operator()(B b) const;
  D operator()(C c) const;
};

Then, you can apply an instance of foo to an object of type A | B | C.


#9

Very cool.

So I’ve updated my branch to use your vector_seq_view design above and provided specializations for RowVector and Vector and I see now that the tests are running that (at least some of) the distribution tests explicitly want to pass in a Matrix<double, Dynamic, Dynamic> for the distribution parameters. The manual says “The multivariate normal probability function is overloaded to allow the variate vector y and location vector μ to be vectors or row vectors (or to mix the two types).” Should we still allow full matrices to be passed in as well? If so, it seems like we’d need to move to a unified vector_seq_view + matrix_seq_view that works on all 3 Eigen types. Do you think one name makes more sense than the other here?


#10

I was thinking about this and had one reaction, then the opposite, then I realized that I don’t know.

I always thought that it was cleaner conceptually having a collection of vectors and a collection of row vectors and those being separate from matrix, but I think I’m in the minority and more importantly, I think people unfamiliar with the language will assume those are the same. Part of my guiding force in the past was the implementation, but it looks like you’re on your way to sort out those issues.

[sorry – this comment might have just added noise instead of something useful.]


#11

We decided at the meeting to not support matrix y in normal_lpdf(y, mu, sigma) in Stan, so we should get rid of it in the tests and Stan math, too.


#12

Each instantiation of the template class Sean is suggesting will be for a specific type. The implementation will work for sequence views of any matrix type. It’s just that it won’t be clear that’s the case from the class definition as it’ll be more generic.

The problem I foresaw is that clients would confuse the scalar_seq_view with the more generic seq_view (or if we name it matrix_seq_view and restrict to Eigen types, then it might not be clear that it also works for vectors, though I take back that objection, because we should be assuming our clients know the Eigen interface for matrices before they start using Stan math).

We can’t consolidate those due to ambiguity (at least w/o another template parameter).


#13

Oh right, of course. A+