# Incorporating "latent" variable data into multiple models

I have data and functional forms for the following relationships (using brms syntax):

Model 1: y ~ x + (1 + x|Group)
Model 2: z ~ x + (1|Group)
Model 3: y ~ z + z^2 + (1 + z|Group)

My objective is to predict y from x; however, I only have very limited paired (x, y) data and need to predict outside the range of the available x values. On the other hand, I have more (x, z) and (z, y) data pairs that cover the required range. What I have been doing so far is using Model 2 to predict z for a range of x, then plugging zhat into Model 3 to get the corresponding range of yhat. This introduces a lot of model error, however (far beyond what would be expected of the true y = f(x) relationship).

I was wondering how to define a brms or Stan model in such a way that incorporates all the models and all the data at once?

Model 1 prediction (y vs. x) Model 2 prediction (z vs x): Model 3 prediction (y vs z): Model 2 → 3 prediction: Thank you!

1 Like

Hi :)

I think the missing value imputation is the way to go

Handle Missing Values with brms • brms (paul-buerkner.github.io)

It would lead to something looking like

``````bf(y ~ mi(x)) +
bf(x | mi() ~ z)
``````

I am not sure however if the random effect will accept the `mi()`.

Hope that helps!
Lucas

1 Like

I suggest a bit of caution here, because model 1 says that y should be linear in x, but Models 2 and 3 together imply that y should be quadratic in x. I think your inference will be prone to do weird things if you try to insist on a posterior that incorporates all of these models at once.

If `z` is noisy or less tightly causally linked to at least one of `{x, y}` than the causal link between `x` and `y`, then it might well be the case that predicting x → z → y yields noisy predictions, and if you need this pathway to inform the location of the x → y relationship over big parts of the domain, then a noisy/uncertain answer might be the right one.

Thanks for the advice so far. The causal DAG is z → x and z → y. For some more context, `x` and `y` are just two different ways of measuring the value of `z`, which will be unknown in practice (outside of the experimental data I have here). Basically, we need a calibration equation that predicts `y` from `x` so we can adjust historical data for which we only have `x` and did not measure `y`.