- How quickly can new additions be integrated - We will probably have a lot of non-standard matrix/vector operations that need to be parallelized and executed on the GPU(for instance adding to diagonal, etc ...)?
I believe a lot of this is in ViennaCL. We could go through the ViennaCL docs and see what things Eigen can do that ViennaCL cannot
How optimized is ViennaCL for dense matrix operations?
They don't explicitly list matrix operations, but you can see some benchmarks here.
Though I would suppose that for simple things it would be reasonably optimized, we can create our own benchmarks as well.
We have created our own Cholesky decomposition that still has some room for improvement as it was done quickly.
I just looked at it and it is very cool!
The code is currently almost twice as fast as the ViennaCL for Cholesky.
This is awesome! Though looking at the outputs it seems like your version loses a bit of precision. It's also odd that the precision is exact for the largest matrix, but is off for two of the smaller matrices. Given that your team whipped that version up pretty quickly, would you be able to make a version that maintains equal precision to the Eigen version?
A hybrid of the two above?
* It would not be costly to detect during compilation what data can be permanently moved to GPU and what partially computed quantities can remain on GPU. Maybe this can be done very effectively.
Maybe an alternative stan-math-gpu library? That would be a lot of work, but it seems like the norm is to create a separate package for the GPU version of packages. This is a design choice that I can't really comment on with any level of expertise.
Overall I think we could approach this in stages, first doing (1) then (2) and then testing if (3) would be useful.
The third question that we had is if we are restricted to double precision arithmetic?
@betanalpha has told me that having double precision is pretty important. I think the goal should be to maintain high precision, then later we can start testing whether we can use floats.
Your Cholesky decomposition is very impressive! Have you thought about speaking to the ViennaCL team about adding it to their library? They mention in this issue on their github that they would be very welcoming of this sort of thing. Anybody please correct me if I'm wrong, but while stan uses a lot of linear algebra I would not necessarily call it a linear algebra library. I think a big win win for everyone would be if you could have your routines integrated into ViennaCL. Then stan can still pass linear algebra to another library and your code gets a larger amount of exposure since other people will be able to use it.