In the now C++11 era, I wonder if it makes sense for stanc to use an Eigen::Tensor where we have previously used a std::vector for arrays. I think an Eigen::Tensor supports everything we are using arrays for in the Stan language, plus more
That can stay the same. The segments of params_r__ get mapped to real scalars, vectors, row_vectors, matrices, and arrays thereof inside log_prob, so the question is whether to use nested std::vectors like we do now or Eigen::Tensors for arrays.
I think that makes most sense for our array data structures. And yes, it’d be a good idea compared to what we do now for the most part. It’d change efficiency profiles, requiring copying to get an element of a multidimensional array (or a huge generalization of our argument passing, which is also a good idea).
Right now, with
real a[M, N];
if you use a[m], there’s no copy.
Isn’t there also a signature with Eigen vectors? What we probably want to do for generality is just use iterators.
To say that Eigen::Tensor is unsupported is really a misnomer. It is in the unsupported/ directory and the core Eigen developers have not committed to support it. But it is almost totally a Google sub-project, making it way more supported than the FFT module used by Stan, which is also in unsupported/ and is basically the work of one guy.
We already depend on at least two “unsupported” aspects of Eigen:
The real gain would come from memory locality rather than pointer chasing.
The downside would be that we wouldn’t be able to return elements of multi-dimensional arrays by reference because we’d have to either use an expression template or copy to pull out of the contiguous memory for tensors.
Yes. I’m wondering if Eigen::Tensor is not like those other aspects of Eigen. It is under active development, which is good and bad. If the interface is solid, then it’s fine. But I’m thinking about the pain we have every time Eigen changes their interfaces. If this is changing frequently, I’d hold off before wanting to depend on it. Unless we have a person that will be accountable for changing all the code every time it breaks and is willing to support code that works across a few reasonable versions of Eigen. That seems like a heavy requirement.
Awesome! I was just thinking about the interface and didn’t think about the deeper implications.
Although, how would we set up memory locality with the tensor module? If it’s what I’m envisioning (see sort of pre-allocation of memory in a chunk), then I don’t see why we couldn’t do the same with std::vector if we were clever. (Except for the row-major thing.)
The problem is that each std::vector instance allocates its own memory using malloc and will realloc in some cases (like pushing a bunch of elements). So rather than getting a single double array for all the values, you wind up with one top-level malloc and M lower-level mallocs (one per row).
If you look at an Eigen::Matrix on the other hand, it’st just a single array of double values and a single malloc. So everything’s in column-major order for memory-local traversal.
That’s where the “if we were clever” part comes in. If we wanted to, we could use a custom allocator for std::vector and use a double array, in this case double[3 * 4], for the underlying memory for std::vector<std::vector<double> >. If we put in that work, I think we’d have approximately the same behavior as Eigen::Tensor.
But yes, it’s much easier to do with the Eigen::Tensor interface than with nested std::vector. Of course, it looks like if we went to Eigen::Tensor, ragged arrays looks like it’s out because the constructor takes the sizes of each of the dimensions on construction.
Btw, I tend to heed warnings put up by package developers when they say something like this in bold up front:
IMPORTANT NOTE: The current developement version of Eigen (post-3.2) supports Tensors. This support is experimental and a moving target.
If I thought that level of memory locality was our bottleneck, I would have suggested and implemented it a long time ago.
That’s right. But you can’t do those types of resizing operators using Eigen::Tensor either (I believe it lets you replace the whole block of memory to a different shape with the same dimensionality, but no resizing). And, we really shouldn’t be using push_back or doing anything dynamic resizing in the code we generate. I don’t believe we have any operations that allow that, right? So it’s not a problem if they were to be mapped directly to memory somehow.
It’s not yet, because we don’t have tensor operations. But it’s totally the bottleneck for matrix operations. No way we could achieve anything like the matrix speed we have now with matrices represented as vector<vector<double>>. The data structure by itself isn’t enough, though—it needs efficient use of that data structure, too.
Don’t get me wrong—I’m not advocating doing this. I can, though, see the motivation.
There might be places where we reserve and push back in the generated code. There are definitely places where we do that in code under the hood in the math lib.
Yes, but I thought we were talking about something slightly different. We already have the Stan matrix type map to Eigen::Matrix instead of std::vector<std::vector<T> >.
If we replaced real[...] with Eigen::Tensor (instead of nested std::vector), it still doesn’t naturally extend to an efficient use of the data structure. I would have thought that to do something like a Kronecker product, we wouldn’t be declaring the Stan data structure as real[M,N,L]. (that said, I think we’d have a better shot at an M-array of matrix[M,N]).
Hmm. I’ll need to dig into the generated code to know if that’s true. I’m going to assume we could reserve and assign instead of push back if that’s the case.
The code under the hood in the math library shouldn’t affect the stuff that maps from the Stan data and parameter blocks, though. At least I’m pretty sure we don’t do too much dynamic resizing of arguments as part of output. (Maybe we do… if we do, maybe we shouldn’t?)
Good point. We’d probably be declaring it as tensor[M, N, L] and then using efficient tensor operations on it. I don’t know that even the matrix gurus transcend three dimensions in their designs.
Nope, and that’s the problem. If we reserve, we have to push back. We have to actually resize, and that causes a bunch of allocation of null constructor autodff objects, which means overhead on the stack for us.
I’m not 100% sure what happens with this:
I am sure you get a vector of 10 var-objects constructed with var(). I’m just not sure if it creates one, then copies (only one extra object on stack) or if it constructs 10 fresh ones. This is why I was careful in some places to create a single dummy object to fill into arrays and vectors in the Stan blocks.
P.S. Eigen works differently—you size an object but there’s no guarantee on construction that it’s in a coherent state—you have to assign for that.
Just to be clear: I don’t think the language has to change to do this. Someone can still declare real x[M,N,L] in their Stan programs, but stanc could generate code for an Eigen::Tensor named x of rank 3 and a double or var as the scalar type.