How to use AD for types with nontrivial destructor?

Hi stan-team,
recently, I started to play around with stan to use it for automatic differentiation in tensor networks.
I have a custom tensor class which represents tensors which are block sparse due to symmetries.
I got the gradient calculation working when the tensor class has a trivial destructor in a similar way as arena_matrix. However, the tensor class has an optional backend where it uses distributed memory matrices from another thirdparty library so that there is no way of avoiding direct malloc calls.
I guess it should still be possible to use the AD framework with help of the chainable_alloc class.
But I did not find any clear documentation. I have read the paper from [1509.07164] The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ but it seems a little bit outdated.
Have you any hints on code samples or documentation about this?
For example, I was wondering why in math/rev/core/set_zero_all_adjoints.hpp only two of the three stacks of AutoDiffStackStorage are traversed and why the var_alloc_stack_ is missing.
Thanks in advance for any help!

The adjoint ode solver uses this chain able alloc thing, for example.

Okay, is this solver also in the stan math c++ repo? I could not find the source code.
I also saw that its used in vari_value<Eigen::SparseMatrix>, but here the class inherits from both, vari_base and chainable_alloc.

Thanks, I found this also by grepping but did not realized that it was the ode solver. I can look into it, but it seems quite tricky because its a nested struct within some other class which inherits vari_base.
Nevertheless, thanks for the hint!

I think, one has to proceed exactly like in the var_value<Eigen::SparseMatrix> case. It is clear that one still has to inherit vari_base so that the static data of the class will be on the AD stack. If the class has dynamic data (e.g. via malloc()), one has to additionally inherit chainable_alloc to ensure proper destruction of the dynamic data. With this it becomes also clear that the set_zero_all_adjoints() function does not traverse the var_alloc_stack_.

The basic approach is still the same. The main difference is that we’re now mostly using lambdas to build our closures rather than writing custom closures on a case-by-case basis (yay for C++11).

In set_zero_all_adjoints, we only need to set the adjoints we’ve set up to zero again before starting another reverse pass.

If it allows pluggable malloc, you could also use our arena allocator, which would mean you wouldn’t need to do cleanup manually by fiddling with the stacks. If it has a destructor that cleans up its memory RAII-pattern style, then you just need to push the class with the destructor onto the stack of variables to be cleaned up (the var_alloc_stack_). You’ll see how they’re deleted in stan/math/rev/core/recover_memory.hpp.

The class has a destructor which calls free, so the last option is the best.
And if I have understand it correctly, pushing the class to the var_alloc_stack_ can be achieved by additionally inheriting from chainable_alloc.