What helps is to make smart guesses about the posterior sd and then scale the parameter such that the posterior sd is roughly 1. You canâ€™t know that in advance, but a proxy suffices.

The other thing is ensuring a good posterior geometry which hinges on your parametrisation. Explore the pairs plots for that.

Finally, you can save the mass matrix and have a look at it (get_adaptation_info in rstan). In cmdstan you can even reuse it.

I tried get_adaptation_info. Do I understand correctly that If any of these values are far away from 1.0 then the corresponding parameters are badly scaled?

Iâ€™ve enjoyed success with this idea using cmdstanr. I run the model once to figure out approximate values for the diagonal mass matrix. Subsequent runs can complete in half the time!

Just for completeness for people searching for this later:
Another trick that is easy to carry out but only helps in some specific scenarios is to restrict the treedepth to just above what it uses after adaptation. This is because during adaptation it sometimes spends a lot of time on high treedepths because it is not adapted yet, where the same adaptation could be reached faster by sampling more iterations with lower treedepth. So the adaptation may need to be run for more samples when using this trick, but should sample faster at the beginning (in those specific cases where the trick helps).

Hopefully this trick will be obsolete when the new â€ścampfireâ€ť adaption routine comes out.