MRP to correct for differences between population and sample?

This question is more conceptual rather than about the specifics of coding up a model in Stan.

I’ve read about the use of MRP to adjust for known differences between a sample and a population (see here for example). However, MRP seems to really be used when we simply want to estimate an outcome and have an unrepresentative sample - such as trying to make an election forecast using XBox users.

Question: Can we also use MRP when we have an association question? That is, if I want to know if Y is associated with X (and both are continuous measures) in a population and all I have is an unrepresentative sample? Thus, the regression estimated by stan_glm(Y ~ X, data = data) seems like it need to be adjusted to account for the sample’s unrepresentativeness.

Is answering this type of question possible using MRP? If so, does anyone have any references, or better yet, code on how to do so?

Thank you in advance!

If you can use MRP or not depends on the specific problem. In particular, if a stratification variable is a collider in the directed acyclic graph (DAG) that describes your association and selection model, then MRP (or “standardization” as some say for post stratification) will introduce bias. In this case you would need to use weighting (which has its own set of problems).

For details see Hernan (2004) https://pubmed.ncbi.nlm.nih.gov/15308962/

or here: https://pubmed.ncbi.nlm.nih.gov/31451995/

Note that whereas in theory weighting works in all instances in which MRP works, there are issues with estimating good weights, propagating uncertainty of weights, … . Therefore I would prefer using MRP, if I can be certain that no stratification variable is a collider.

(There are additional complications depending on if you are running a logistic regression or not. In some situations you can get unbiased results from a “non-representative” sample, but I don’t remember the details. You could check for the term collapsibility).

Using DAGs implies that one is interested in a causal question, and bias refers to the situation in which the association between 2 variables is influenced by something else then the direct or indirect effect of one variable on the other.

This response takes pretty much the DAG perspective to looking at selection bias. Econometrics takes a different approach (see e.g. the Heckman selection model), but I know too little about this. Maybe others have some more concrete suggestions here.

(Edit: Removed some typos)

1 Like