Depends on what you’re doing, of course.

You need memory for the data for all of the chains if you’re running in parallel. That’s easy to compute as the C++ storage for it has negligible overhead.

Each iteration, you have to compute multiple leapfrog steps to simulate the Hamiltonian. The number of such steps is determined by the geometry of your problem, which can get better or worse as data size grows and with model misspecification (badly fitting models are much harder to fit—the Folk Theorem in action).

Each leapfrog step computes the density and the gradient w.r.t. parameters. This is usually fairly constant in memory and time, but if there are nested iterative algorithms, like ODE solvers, then where you’re at in the parameter space can be a big deal. For instance, some portions of parameter space may render the ODE stiff and kill solver performance. For storage, you need to store the entire expression graph (node for each operation, edge for each operand to the operation). You also need to store log N copies of the parameters where N is the number of leapfrog steps required, but this is usually negligible compared to other overhead.

Then in R, I think you need about twice the space required to store all fo the draws in memory.