Neat autodiff C++ package Enoki

Found a neat C++ package the other day for autodiff called Enoki. Some of the things I liked in particular

  1. Instead of using a serial container to store the autodiff operations they use an unorderd map and indices to store the actual expression graph in it’s real graph form.

  2. Uses structs of arrays instead of arrays of structs

  3. Can operate on the expression graph to simplify expressions

Cool to look at, lot there so sort of scrolling through a bit at a time. Been thinking lately that if Stan kept track of where vars are in the expression tree we could make the whole expression tree into a task graph via the TBB for free parallelization. Would be a big rewrite tho’

2 Likes

Thanks for the pointer.

That sounds really expensive. How’s its performance?

Does it allow modifying entries in the arrays? Forwarding those without chains of pointers has always been the stumbling block in my trying to work through this.

Lots of autodiff systems do this. But it only makes sense if you have static autodiff graphs (like the original TensorFlow), not the dynamic autodiff we do (where each iteration can take different branches) or PyTorch or the new TensorFlow module does.

The current plan for Stan is to reduce the program graph, rather than the autodiff graph. That’s reusable even with dynamic autodiff.

I worked through some of these details in a tutorial autodiff from scratch thread:

They don’t have any benchmarks posted but I just filed an issue on the repo asking for them.

I’m pretty interested in seeing if all the extra specialization on instruction sets and tree simplification outweighs the cost of the unordered_map. They also don’t have a local allocator, which feels like it would be useful if they mainly work on static graphs.

Does it allow modifying entries in the arrays? Forwarding those without chains of pointers has always been the stumbling block in my trying to work through this.

It looks like you can, link below is their autodiff type

The current plan for Stan is to reduce the program graph, rather than the autodiff graph. That’s reusable even with dynamic autodiff.

Is this the optimizations in the new compiler?

I worked through some of these details in a tutorial autodiff from scratch thread:

Nice! Been meaning to reread that thread