The nesting is for nested autodiff. That means that we start a new stack, but just use the same memory for it. It's used in the ODE solver to compute Jacobians in a nested fashion. Then it's all popped off the stack and freed to go back to the next level of autodiff. That's described in the paper, but maybe not well, because I didn't want to dive into crazy low-level details and obscure the bigger picture.
And yes, the second stack is for variables that don't get autodiffed. We use that in matrix operations to reduce the number of virtual function calls. I believe that was also discussed, though again probably too vaguely, in the autodiff paper.
std::vector objects work as usual and encapsulate their own mallocs to follow the RAII pattern (Eigen matrices work the same way). Those keep track of the variables so we know how to work back through the stack. I'm not actually sure that we need to keep that
var_stack_ is traversed in the derivative propagation (reverse step) for autodiff.
The sizing of everything's complicated, because we use an increasing sequence of underlying memory blocks rather than copying everything into a bigger array. That might not have been the best choice, but when we profiled, it didn't provide extra overhead at run time because the stacks are what's being traversed, not the arrays directly. And we already blow memory locality because there's no way to preserve it in an expression graph.
You can see why this all needs better doc!