Addressing endogenous variables using a control function in brms

Hello friends,

My problem: I want to estimate a model of Y versus an explanatory variable (X) in brms, where the relationship is endogenous. Now, addressing endogeneity using instrumented variables in brms is covered in this blog post.

Rather than using IV, however, I would like to use a control function. A control function approach is similar to IV, in that you regress your endogenous variable (X) on your instrument (Z) (and any other exogenous variables). Where CF differs, however, is that the residuals from the “first stage” (R) enter the “second stage”, where you regress Y on X (NB: Of course as a multi-level model in brms the two stages are estimated simultaneously).

Specifically, a simple control function might look like:

X = a1*Z ("First stage")
Y = b1*X + b2*R ("Second stage")

One of the nice things about control functions is that they work for non-linear models and allow for non-linear endogenous relationships via R.

So, it’d be cool if there was a way to estimate them in brms without having to go to Stan! Any advice most appreciated and apologies if I’ve missing something obvious.

P.s. @paul.buerkner wondering whether you’d seen this post and/or had any thoughts? I was wondering whether there was an internal variable for the residuals within brms that could be used to estimate a control function like this simultaneously, rather than in two separate steps like I am now?

Apologies in advance if my terminology is unclear and/or I’ve missed something obvious.

I don’t think this would be possible natively in brms. I recommend that you build a model in brms that is as close as possible to your goal (e.g., using X instead of its residual as predictor) and then edit the generated Stan code according to your needs. You can generate the brms Stan code via make_stancode().