How to deal with extremely small sample sizes in the MRP/MrsP ?

I write a paper on province-level subjective well-being. One reviewer argue that some provinces have extremely small sample sizes (10 in Qinghai, 11 in Ningxia, and 39 in Inner Mongolia). Even with the “borrowing strength” mechanism of the MRP/MrsP model, over-shrinkage effects may still pull the estimated values of these provinces toward the overall mean, concealing true differences.

They should be shrunk to the mean adjusted for province level covariates, not necessarily the overall mean (though maybe that is what you meant).

On a somewhat philosophical level, the reviewer is technically correct – this is what MRP does and it is certainly possible to “overshrink” in some sense. However, the alternative would be to fit a model without regularization that would produce possible large and highly uncertain estimates for the small sample size groups. Or simply to conclude that you don’t have enough data and not fit a model at all. While MRP has its drawbacks, these alternatives may not be desirable either – ultimately the data that you have is the data that you have and one has to do something with it. Put differently, while it is possible that an extremely large effect estimate from a small sample size group is indicative of a actually large effect, the sample size is so small that such an estimate isn’t very reliable.

Some possible you could discuss in the response:

  • If you haven’t already, plot the crude vs MRP estimates to see exactly how much shrinkage is actually happening
  • cross-validation showing that the (estimated) out-of-sample performance is better for the regularized vs unregularized models (this is not without its own challenges, see https://sites.stat.columbia.edu/gelman/research/published/final_sub.pdf)
  • Simulation study where you show that the regularized method is more efficient at estimating the ground truth parameters in small groups

The point of multilevel regression is that you reduce variance by introducing a little bias (relative to the maximum likelihood estimate—this is not necessarily bias if the population model is correct).

If they’re balking at Bayesian statistics, point them to Stein’s result that this generally reduces error in frequentist models, too. This is why they commonly employ empirical Bayes, as in the frequentist models in the 1970s of Efron and Morris (empirical Bayes is only approximate Bayes, not full Bayes, and it’s not more empirical not Bayesian, nor is more empirical than standard Bayes approaches).

Even without hierarchical models, you get the same kind of shrinkage in frequentist models using lasso or ridge estimators. Both trade bias (vs. the MLE) for a reduction in variance leading to an overall improvement in accuracy of estimates.