How to perform robustness and sensitivity analyses in MRP for small area estimation

I adopt multilevel regression and poststratification (MRP) for small area estimation and intend to conduct robustness and sensitivity analyses to validate my results. My dataset has two key limitations: three provinces with completely missing data and two provinces with sample sizes of fewer than 20 participants.
To address these issues, I design the following three-step validation strategy:
Excluding only the three provinces with missing data from the analysis;
Excluding both the three provinces with missing data and the two provinces with sample sizes < 20;
Implementing leave-one-province-out cross-validation (LOPO-CV) on the full dataset.
If the provincial rankings derived from the above three checks remain consistent, this will demonstrate that the MRP model results are robust. how about my strategy?

When you say three provinces have “completely missing data,” do you mean that the data’s missing or that the counts are zero. Similarly, with “two provinces with sample sizes of fewer than 20 participants” is that the correct number of counts, or is there missingness there, too, in that you think counts are underreported? If it’s a problem in 5 provinces, aren’t you worried it’s a problem in the other provinces, too?

If it’s just that you have five provinces with messed up data, it might indeed be best to just exclude them. Otherwise, you need to include a measurement error model.

You can do LOO, but that’s mainly useful for comparing models unless you have a good understanding of what ELPD should be. I don’t think the rankings staying consistent will tell you much because leaving one province out is only going to change the hierarchical prior.

What I’d recommend instead is posterior predictive checks. That’s the closest you get to chi-squared goodness of fit tests in conventional frequentist regressions.

Thank you for your insightful comments. The three provinces with completely missing data and the two provinces with sample sizes < 20 happen in the survey data, while their population-level covariates (e.g., age, gender, education distribution) in the poststratification frame are fully observed.

Regarding posterior predictive checks (PPCs): I have successfully implemented PPCs in logistic stage (e.g., comparing predicted vs. observed probabilities of the outcome for survey respondents). However, I am unsure how to extend PPCs to the poststratification step, do you have practical recommendations for adapting PPCs to MRP settings?

Not really, but maybe @andrewgelman will.

PPCs just give you a way of generating new data \tilde{y} given observed data y, i.e., p(\tilde{y} \mid y). The problem with MRP here is that only the multilevel regression part is generating data.

A lot depends what you’re interested in. A new model can fit the data better and make better predictions for small demographic slices of the population, but if all you care about is state-level averages, then maybe that’s not important in practice.

I fully concur with this view that PPCs are not suitable for evaluating small area estimation. My objective here is not model comparison but model validation, and my current approach involves altering prior specifications to conduct robustness checks. Are there any more reviewer-friendly methods for conducting robustness analysis in the context of small area estimation?

I disagree with your statement, “PPCs are not suitable for evaluating small area estimation.” PPCs are very suitable for evaluating small area estimation! My point was just that you should look at graphs and other predictive quantities that are relevant to the small-area questions you are asking. And this is all about model validation, or, more precisely, exploring the ways in which the model does not fit the data.

As part of a sensitivity analysis within the MRP framework, I varied the prior distributions for the regression coefficients (normal, Cauchy, and Student’s t). To determine whether these prior choices led to significantly different estimates of provincial happiness levels, can I employ the Friedman test for related samples?

I don’t recommend doing a hypothesis test for statistical significance. In your case, normal and Cauchy are special cases of the t distribution. So I recommend doing the t, with a hyperprior on the degrees of freedom parameter.

That is a sound idea. A key step, however, is comparing the posterior distributions of parameters across different small areas, and I am puzzled about how to proceed. For instance, how should one compute the difference between estimates,\hat \theta_{normal} - \hat \theta _{t} , to facilitate this comparison?

It depends on what you’re trying to do. It looks like your expression

\widehat{\theta}^\text{normal} - \widehat{\theta}_t

is trying to measure the difference in point estimates. If you have posteriors for one or both of these, you can plot histograms of the difference, which will be more enlightening about posterior uncertainty than trying to reduce to point estimates.