Function call stack of default HMC with NUTS -> Shared memory parallelisation

I found this thread [(older) parallel AD tape ideas] and currently reading it, hoping it will shed more light onto how AD tapes are created and handled by the code.