I’m trying to speed up model convergence, and am considering finding the posterior mode using Stan’s LBFGS routines, and then initializing the sampler chains with it, or a perturbation.
At first glance, I would expect this to be a relatively standard thing to do. However, I couldn’t find any information about people doing that (in this forum or elsewhere online).
Are there in fact advantages in initializing the sampler close to the mode? What are the drawbacks and caveats one should be aware of?