I have data and functional forms for the following relationships (using brms syntax):
Model 1: y ~ x + (1 + x|Group)
Model 2: z ~ x + (1|Group)
Model 3: y ~ z + z^2 + (1 + z|Group)
My objective is to predict y from x; however, I only have very limited paired (x, y) data and need to predict outside the range of the available x values. On the other hand, I have more (x, z) and (z, y) data pairs that cover the required range. What I have been doing so far is using Model 2 to predict z for a range of x, then plugging zhat into Model 3 to get the corresponding range of yhat. This introduces a lot of model error, however (far beyond what would be expected of the true y = f(x) relationship).
I was wondering how to define a brms or Stan model in such a way that incorporates all the models and all the data at once?
I suggest a bit of caution here, because model 1 says that y should be linear in x, but Models 2 and 3 together imply that y should be quadratic in x. I think your inference will be prone to do weird things if you try to insist on a posterior that incorporates all of these models at once.
If z is noisy or less tightly causally linked to at least one of {x, y} than the causal link between x and y, then it might well be the case that predicting x → z → y yields noisy predictions, and if you need this pathway to inform the location of the x → y relationship over big parts of the domain, then a noisy/uncertain answer might be the right one.
Thanks for the advice so far. The causal DAG is z → x and z → y. For some more context, x and y are just two different ways of measuring the value of z, which will be unknown in practice (outside of the experimental data I have here). Basically, we need a calibration equation that predicts y from x so we can adjust historical data for which we only have x and did not measure y.