Switch from `std::vector` to `Eigen::Tensor` in compiled models?

In the now C++11 era, I wonder if it makes sense for stanc to use an Eigen::Tensor where we have previously used a std::vector for arrays. I think an Eigen::Tensor supports everything we are using arrays for in the Stan language, plus more

https://bitbucket.org/eigen/eigen/src/1f0f8337c029a3a8f2720bfcabbed13579bc9093/unsupported/Eigen/CXX11/src/Tensor/README.md?at=default

And there is more room to grow in terms of using GPUs and ThreadPools and stuff.

1 Like

Where exactly would std:vector be replaced?

Inside the log_prob method and in the public members of the model class.

In that case, I’d be in favor of sticking to std::vector arguments in
public methods. Cython automatically converts to and from Python types
and std::vector.

I would expect that std::vector arguments would be easier than Eigen
arguments for most languages (e.g., Julia, Rust, Go, Haskell) which are
trying or might eventually try to interface with Stan C++.

The signature of log_prob is

    template <bool propto__, bool jacobian__, typename T__>
    T__ log_prob(vector<T__>& params_r__,
                 vector<int>& params_i__, // this is size 0 anyway
                 std::ostream* pstream__ = 0) const {

That can stay the same. The segments of params_r__ get mapped to real scalars, vectors, row_vectors, matrices, and arrays thereof inside log_prob, so the question is whether to use nested std::vectors like we do now or Eigen::Tensors for arrays.

I think that makes most sense for our array data structures. And yes, it’d be a good idea compared to what we do now for the most part. It’d change efficiency profiles, requiring copying to get an element of a multidimensional array (or a huge generalization of our argument passing, which is also a good idea).

Right now, with

real a[M, N];

if you use a[m], there’s no copy.

Isn’t there also a signature with Eigen vectors? What we probably want to do for generality is just use iterators.

Just thinking about the practical implications, shouldn’t we try to stick with std::vector since we’d now have to depend on an unsupported part of Eigen? I still see:

  • installation difficulties with RStan and things like BH and RcppEigen every so often on the list.
  • PyStan still trying to trim parts of the distribution to keep the size down.

This just seems like we’re making those things more difficult for not a real gain.

+1

To say that Eigen::Tensor is unsupported is really a misnomer. It is in the unsupported/ directory and the core Eigen developers have not committed to support it. But it is almost totally a Google sub-project, making it way more supported than the FFT module used by Stan, which is also in unsupported/ and is basically the work of one guy.

3 Likes

In my mind, that makes it worse for dependency management.

Do you have a sense of how fast the code in Eigen moves? And does RcppEigen already pack the unsupported Eigen::Tensor module?

If the API is stable and it’s easy to use, then it’s reasonable. But if either of those aren’t true, then I’d lean towards waiting.

We already depend on at least two “unsupported” aspects of Eigen:

  • FFT
  • algebraic solver

The real gain would come from memory locality rather than pointer chasing.

The downside would be that we wouldn’t be able to return elements of multi-dimensional arrays by reference because we’d have to either use an expression template or copy to pull out of the contiguous memory for tensors.

Yes. I’m wondering if Eigen::Tensor is not like those other aspects of Eigen. It is under active development, which is good and bad. If the interface is solid, then it’s fine. But I’m thinking about the pain we have every time Eigen changes their interfaces. If this is changing frequently, I’d hold off before wanting to depend on it. Unless we have a person that will be accountable for changing all the code every time it breaks and is willing to support code that works across a few reasonable versions of Eigen. That seems like a heavy requirement.

Awesome! I was just thinking about the interface and didn’t think about the deeper implications.

Although, how would we set up memory locality with the tensor module? If it’s what I’m envisioning (see sort of pre-allocation of memory in a chunk), then I don’t see why we couldn’t do the same with std::vector if we were clever. (Except for the row-major thing.)

When we declare

real a[3, 4];

the type in C++ is

std::vector<std::vector<double> > a;

The problem is that each std::vector instance allocates its own memory using malloc and will realloc in some cases (like pushing a bunch of elements). So rather than getting a single double array for all the values, you wind up with one top-level malloc and M lower-level mallocs (one per row).

If you look at an Eigen::Matrix on the other hand, it’st just a single array of double values and a single malloc. So everything’s in column-major order for memory-local traversal.

That’s where the “if we were clever” part comes in. If we wanted to, we could use a custom allocator for std::vector and use a double array, in this case double[3 * 4], for the underlying memory for std::vector<std::vector<double> >. If we put in that work, I think we’d have approximately the same behavior as Eigen::Tensor.

But yes, it’s much easier to do with the Eigen::Tensor interface than with nested std::vector. Of course, it looks like if we went to Eigen::Tensor, ragged arrays looks like it’s out because the constructor takes the sizes of each of the dimensions on construction.

Btw, I tend to heed warnings put up by package developers when they say something like this in bold up front:

IMPORTANT NOTE: The current developement version of Eigen (post-3.2) supports Tensors. This support is experimental and a moving target.

Too clever for me.

The allocator for vector<vector<vector<double>>> would have to link all the nested allocators. Then I’m pretty sure operations like push_back would break everythnig and you can’t rewrite them.

If I thought that level of memory locality was our bottleneck, I would have suggested and implemented it a long time ago.

That’s right. But you can’t do those types of resizing operators using Eigen::Tensor either (I believe it lets you replace the whole block of memory to a different shape with the same dimensionality, but no resizing). And, we really shouldn’t be using push_back or doing anything dynamic resizing in the code we generate. I don’t believe we have any operations that allow that, right? So it’s not a problem if they were to be mapped directly to memory somehow.

It’s not yet, because we don’t have tensor operations. But it’s totally the bottleneck for matrix operations. No way we could achieve anything like the matrix speed we have now with matrices represented as vector<vector<double>>. The data structure by itself isn’t enough, though—it needs efficient use of that data structure, too.

Don’t get me wrong—I’m not advocating doing this. I can, though, see the motivation.

There might be places where we reserve and push back in the generated code. There are definitely places where we do that in code under the hood in the math lib.

1 Like

Yes, but I thought we were talking about something slightly different. We already have the Stan matrix type map to Eigen::Matrix instead of std::vector<std::vector<T> >.

If we replaced real[...] with Eigen::Tensor (instead of nested std::vector), it still doesn’t naturally extend to an efficient use of the data structure. I would have thought that to do something like a Kronecker product, we wouldn’t be declaring the Stan data structure as real[M,N,L]. (that said, I think we’d have a better shot at an M-array of matrix[M,N]).

Hmm. I’ll need to dig into the generated code to know if that’s true. I’m going to assume we could reserve and assign instead of push back if that’s the case.

The code under the hood in the math library shouldn’t affect the stuff that maps from the Stan data and parameter blocks, though. At least I’m pretty sure we don’t do too much dynamic resizing of arguments as part of output. (Maybe we do… if we do, maybe we shouldn’t?)

Good point. We’d probably be declaring it as tensor[M, N, L] and then using efficient tensor operations on it. I don’t know that even the matrix gurus transcend three dimensions in their designs.

Nope, and that’s the problem. If we reserve, we have to push back. We have to actually resize, and that causes a bunch of allocation of null constructor autodff objects, which means overhead on the stack for us.

I’m not 100% sure what happens with this:

vector<var> a(10);

I am sure you get a vector of 10 var-objects constructed with var(). I’m just not sure if it creates one, then copies (only one extra object on stack) or if it constructs 10 fresh ones. This is why I was careful in some places to create a single dummy object to fill into arrays and vectors in the Stan blocks.

P.S. Eigen works differently—you size an object but there’s no guarantee on construction that it’s in a coherent state—you have to assign for that.

1 Like

Thanks for putting up the example! I now remember and you’re absolutely right. (I guess we could try to be clever here too, but it’d probably be inefficient.)

Just to be clear: I don’t think the language has to change to do this. Someone can still declare real x[M,N,L] in their Stan programs, but stanc could generate code for an Eigen::Tensor named x of rank 3 and a double or var as the scalar type.