Check out the math paper for details on this stuff: https://arxiv.org/abs/1509.07164
Stan uses the same sorta algorithms that everyone else does – so the theoretical scalings should be comparable between frameworks. The split is forward mode vs. reverse mode autodiff.
Forward mode scales better for problems with few inputs, reverse for problems with few outputs. When sampling a log density in HMC, you’ve just got one output so reverse mode autodiff works nicely. Same argument for optimization problems.
There’s probably room for confusion on how complexity is written out. One gradient descent step takes one reverse mode autodiff pass, but the things you do inside that pass could be complicated things. Depends on what you’re looking – if you care about the calculations inside your model or just how the autodiff is arrange.
edit: Edited to clarify a couple things, hope that helps!