Do people have any good examples of instrumental variables analyses in Stan? Someone was asking me, and a quick google search found something by Jim Savage from 2017, but I thought maybe there was something more recent out there?
In particular, this person is looking for a latent instrumental variables model.
Chiming in here that me and @Stephen_Martin we’re able to successfully simulate and estimate a simple latent IV. As this was part of work, we cannot share that particular code.
A few notes. Our tests were on a continuous outcome with a binary LIV. We were able to get unbiased estimates of the true effect if the two groups of the latent IV were sufficiently identified. All the issues with a mixture model apply like means that are close or large variances such that the IV groups are unidentified.
The math comes from
One other thing, the formula for the variance matrix didn’t work. The model sampled just fine and recovered the matrix using a cholesky_factor_corr. We double checked the derivation and it seemed fine, @Stephen_Martin may remember more details here.
I’d have to dig into my notes to recall more precisely.
But using the matrix in the paper (which I also derived as a sanity check), it could sample fine, but was far and away less reliable. If you set the intercept of the outcome model to zero, it worked fine more consistently. This is indicative of a mixing non-identifiability (which makes sense if you actually write it out, you can have effectively have two constants in the outcome model unless the separation is huge). We also tried a few different formulations (dummy-coded IV model vs one-hot-encoded IV model; marginalized covariance matrix vs estimating the covariance as @spinkney mentioned), dirichlet processes, etc.
In the end, they all could work; the issue is mainly that when it fails due to insufficient separation, there’s not a good way to detect it. We were generating the data, so we knew what the ‘true’ causal treatment effect was. But sometimes the latent IV and the confounders effectively switch roles, and the estimated treatment effect is quite literally the opposite effect of interest. Or - it’d just be some odd average between the true treatment and confounder effects due to within chain oscillation. Or it’d just be flatly wrong. But it never gave obviously wrong estimates or had major sampling issues to red flag a bad solution. It would more consistently estimate the correct treatment effect if the true intercept were truly zero and the model’s intercept were fixed to zero (if I recall correctly), but that’s in insane assumption in reality, so the point was moot (side note: and transforming the data to ‘make’ this assumption valid did not work at all).
TLDR: Lots of variations of latent discrete IVs worked, but none were particularly trustworthy. When it failed to provide the correct treatment effect (i.e., when it was actually estimating an omnibus confounder effect, or some middleground between the two), there was no way of knowing that it had failed. Meaning that in practice, it would be very, very hard to deploy without some fairly strong prior information about the true treatment effect; and that prior information would need to be so strong that it’d effectively be your answer, and the LIV model would not be necessary (in my opinion).
Assuming you had a vigorous prior, would running a LIV have an edge in terms of external validity?
I’m not sure I understand what you mean (an edge over what?). But I think my fear can basically be summarized as:
It works fine when you remove the unidentifiability; one method is to set the outcome model’s intercept to zero. This only works if, in truth, the expected outcome is zero when the LIV is zero (this is never true). Alternatively, you can just hope you have something very well separated (you have no control over this). Finally, you can have strong prior info on the LIV or treatment effect.
If you have such strong prior info on the LIV or treatment effect that this would be consistently identified, you probably have so much information that there’s no need to even fit the model on new data. And this would need to be quite strong, based on my experiments, especially so if the treatment and omnibus confounder effects are similarly signed, or of similar magnitude.
I’m not sure what that would mean for external validity, tbh.
My general take on instrumental variables models is they can blow up, and it’s good to include strong priors if you can.