Forgive me if I am missing something obvious or if there is another thread about this topic. Background: I was relating to a colleague what I thought I had understood as a take-home message from among several posts on the older users group. That is: one should strive to have posterior distribution estimates (for “core parameters” if that makes any sense?) with roughly the same scale.
But, when asked why, I realized that I could not express a succinct reason, nor did I have a reference handy that discusses this issue. Can anyone help me here, either with some nice bullet points supporting this idea, or by pointing me to a good reference that discusses this.
Thanks in advance for any assistance you care to offer.
If the scales are consistent then the resulting posterior will be much, much easier to fit. Numerically and algorithmically operating with numbers that are all O(1) is much easier than operating with numbers that are unbalanced, say O(0.001) and O(10000).
Note that the computational performance is based on the posterior scales which we are not always able to control. Often we are in control of only the prior scales.
Setting the prior scales consistently (for example by choosing appropriate units) can go a long way to improving the scale consistency of the posterior but it also makes it easier to reason about weakly-informative priors. See http://mc-stan.org/documentation/case-studies/weakly_informative_shapes.html for more discussion.