I am estimating the effect of a randomized binary treatment on a univariate outcome in a Bayesian linear model. I allow the treatment to vary linearly with the gender of the participant:
y ~ 1 + treat + gender + treat*gender
While my sample is not representative of the population, my data include “sampling weights” (wts
) proportional to a unit’s probability of sample selection. For present purposes, assume the model for the weights is correctly-specified but unknown. We’ll make some additional assumptions:
- Each of the K unique values of
wts
maps to a unique cell in an unobserved post-stratification table - The sample contains at least 1 observation for each cell of that post-stratification table partitioning the population.
- There may be as few as 1 observation per each of the K unique values (cells) of
wts
.
Given these assumptions, a practical way to adjust our sample would be to post-stratify on the cells defined by the unique values of wts
and estimate the population treatment effect by weighting the posterior estimates for each cell inverse to its wts
value.
To infer the treatment in the population, we’ll let the treatment effect to vary with wts
. There are several ways to do this in a model with an interaction:
wts <- as.factor(wts) # wts is discrete!
y ~ 1 + treat + gender + treat:gender + (treat | wts) #1: hierarchical treat slope
y ~ 1 + treat + wts + gender + treat:wts + treat:gender #2: non-hierarchical interaction
y ~ 1 + treat*wts*gender #3 saturated triple interaction
My guess is that many would advocate for Model #1 above as it imposes a hierarchical prior on the heterogeneity of treatment effects across wts
to avoid overfitting while still allowing heterogeneity. In contrast, without an enormous amount of data, model 2 and 3 would likely overfit with so many interactions and might not even converge.
My concern, however, is that I simply lack the data necessary to get reasonably precise estimates from such a model. Data collection was based on the simple linear interaction model at the start of this post and adding additional parameters will likely inflate uncertainty. Of course, this variance inflation is the price to pay for a less biased population ATE estimate.
In light of this, I’m wondering if anyone has advice for reducing variance while post-stratifying. Reducing the number of post-stratification cells seems great but in the case I provided where I only have sampling weights, I’m not aware of a principled approach. Perhaps I am mistaken that the post-stratified estimate will strictly have greater variance? Thanks.