(I know is quite a generic model/data dependent question, and probably I will need to do a little study on this, but…) is there any rule of thumb coming from people experience for in what data regime I can somewhat safely start using optimize to calculate expected values, instead of sampling?
For non-hierarchical generalized linear models with Gaussian prior on weights, if the number of observations is more than 10-100 times higher than the number of weights (covariates) and it’s not Binomial data with all observations as failure or success, then optimizing and normal distribution approximation at the mode is likely to work. For other models no generic rule.
To emphasize @avehtari’s point: Optimization is often just impossible. Hierarchical and latent models cannot be directly optimized; They need marginalization at some point, and direct joint optimization will just put you into one of many, many local maxima. For non-latent problems, optimization is often ‘fine’ (although it may depend on optimizer). For trivial latent problems, you can sometimes reparameterize it into an analytically marginalized model, which can be optimized. For most problems where Bayes is particularly useful, you are going to struggle with optimization no matter what data size you have.