Hi stan-team,

recently, I started to play around with stan to use it for automatic differentiation in tensor networks.

I have a custom tensor class which represents tensors which are block sparse due to symmetries.

I got the gradient calculation working when the tensor class has a trivial destructor in a similar way as `arena_matrix`

. However, the tensor class has an optional backend where it uses distributed memory matrices from another thirdparty library so that there is no way of avoiding direct `malloc`

calls.

I guess it should still be possible to use the AD framework with help of the `chainable_alloc`

class.

But I did not find any clear documentation. I have read the paper from [1509.07164] The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ but it seems a little bit outdated.

Have you any hints on code samples or documentation about this?

For example, I was wondering why in `math/rev/core/set_zero_all_adjoints.hpp`

only two of the three stacks of `AutoDiffStackStorage`

are traversed and why the `var_alloc_stack_`

is missing.

Thanks in advance for any help!

The adjoint ode solver uses this chain able alloc thing, for example.

Okay, is this solver also in the stan math c++ repo? I could not find the source code.

I also saw that its used in `vari_value<Eigen::SparseMatrix>`

, but here the class inherits from both, `vari_base`

and `chainable_alloc`

.

Thanks, I found this also by grepping but did not realized that it was the ode solver. I can look into it, but it seems quite tricky because its a nested struct within some other class which inherits `vari_base`

.

Nevertheless, thanks for the hint!

I think, one has to proceed exactly like in the `var_value<Eigen::SparseMatrix>`

case. It is clear that one still has to inherit `vari_base`

so that the static data of the class will be on the AD stack. If the class has dynamic data (e.g. via `malloc()`

), one has to additionally inherit `chainable_alloc`

to ensure proper destruction of the dynamic data. With this it becomes also clear that the `set_zero_all_adjoints()`

function does not traverse the `var_alloc_stack_`

.

The basic approach is still the same. The main difference is that we’re now mostly using lambdas to build our closures rather than writing custom closures on a case-by-case basis (yay for C++11).

In `set_zero_all_adjoints`

, we only need to set the adjoints we’ve set up to zero again before starting another reverse pass.

If it allows pluggable malloc, you could also use our arena allocator, which would mean you wouldn’t need to do cleanup manually by fiddling with the stacks. If it has a destructor that cleans up its memory RAII-pattern style, then you just need to push the class with the destructor onto the stack of variables to be cleaned up (the `var_alloc_stack_`

). You’ll see how they’re deleted in `stan/math/rev/core/recover_memory.hpp`

.

Thanks!

The class has a destructor which calls `free`

, so the last option is the best.

And if I have understand it correctly, pushing the class to the `var_alloc_stack_`

can be achieved by additionally inheriting from `chainable_alloc`

.