Addressing endogenous variables using a control function in brms

Hello friends,

My problem: I want to estimate a model of Y versus an explanatory variable (X) in brms, where the relationship is endogenous. Now, addressing endogeneity using instrumented variables in brms is covered in this blog post.

Rather than using IV, however, I would like to use a control function. A control function approach is similar to IV, in that you regress your endogenous variable (X) on your instrument (Z) (and any other exogenous variables). Where CF differs, however, is that the residuals from the “first stage” (R) enter the “second stage”, where you regress Y on X (NB: Of course as a multi-level model in brms the two stages are estimated simultaneously).

Specifically, a simple control function might look like:

X = a1*Z ("First stage")
Y = b1*X + b2*R ("Second stage")

One of the nice things about control functions is that they work for non-linear models and allow for non-linear endogenous relationships via R.

So, it’d be cool if there was a way to estimate them in brms without having to go to Stan! Any advice most appreciated and apologies if I’ve missing something obvious.