Improving precision when post-stratifying with MRP

I am estimating the effect of a randomized binary treatment on a univariate outcome in a Bayesian linear model. I allow the treatment to vary linearly with the gender of the participant:

y ~ 1 + treat + gender + treat*gender

While my sample is not representative of the population, my data include “sampling weights” (wts) proportional to a unit’s probability of sample selection. For present purposes, assume the model for the weights is correctly-specified but unknown. We’ll make some additional assumptions:

  • Each of the K unique values of wts maps to a unique cell in an unobserved post-stratification table
  • The sample contains at least 1 observation for each cell of that post-stratification table partitioning the population.
  • There may be as few as 1 observation per each of the K unique values (cells) of wts.

Given these assumptions, a practical way to adjust our sample would be to post-stratify on the cells defined by the unique values of wts and estimate the population treatment effect by weighting the posterior estimates for each cell inverse to its wts value.

To infer the treatment in the population, we’ll let the treatment effect to vary with wts. There are several ways to do this in a model with an interaction:

wts <- as.factor(wts) # wts is discrete!

y ~ 1 + treat + gender + treat:gender + (treat | wts) #1: hierarchical treat slope
y ~ 1 + treat + wts + gender + treat:wts + treat:gender #2: non-hierarchical interaction
y ~ 1 + treat*wts*gender #3 saturated triple interaction

My guess is that many would advocate for Model #1 above as it imposes a hierarchical prior on the heterogeneity of treatment effects across wts to avoid overfitting while still allowing heterogeneity. In contrast, without an enormous amount of data, model 2 and 3 would likely overfit with so many interactions and might not even converge.

My concern, however, is that I simply lack the data necessary to get reasonably precise estimates from such a model. Data collection was based on the simple linear interaction model at the start of this post and adding additional parameters will likely inflate uncertainty. Of course, this variance inflation is the price to pay for a less biased population ATE estimate.

In light of this, I’m wondering if anyone has advice for reducing variance while post-stratifying. Reducing the number of post-stratification cells seems great but in the case I provided where I only have sampling weights, I’m not aware of a principled approach. Perhaps I am mistaken that the post-stratified estimate will strictly have greater variance? Thanks.

This can work OK for estimating means, but it can be problematic for variances. That is, if you probability weight a sample, you’ll get the right means, but not the right variances. So you may have other problems with variance due to weighting.

Can you say more about why you’re trying to reduce variance? And is it the variance of the poststrat cells or the variance of the parameters you’re worried about?

If you change the model, you change the meaning of the parameters, so it’s not increasing the variance on the same thing. For example, let’s say you enforce complete pooling. All of a sudden you get very tight posteriors around the single pooled effect. This isn’t going to be a useful reduction in variance if complete pooling isn’t appropriate for the data. The way to check that is testing calibration, either with posterior predictive checks or cross-validation.

You may find that the poststrat variance doesn’t go up nearly as much as the parameter variance when you introduce a hierarchical model. For example, if I replace parameter mu with parameters mu1 + mu2, my model is now completely unidentified. If I add an identifying prior on mu1 and mu2, then the posterior variances on mu1 and mu2 will be large compared to mu, but the variance of the sum mu1 + mu2 should be similar as should downstream predictions.

I would recommend choosing the model that looks the best under posterior predictive checks and cross-validation and don’t bother worrying about the variance of parameters.

Overfitting isn’t such a problem in Bayesian models if you have reasonable priors—you will only fit as tightly as your data allows, and the additional variance in fitting flows through to downstream inference in Bayes. If I add an extra parameter to a regression that’s just noise, it will tend to get wide posteriors and wash out during inference.