Incorporating "latent" variable data into multiple models

Aorus · December 12, 2022, 4:29pm

I have data and functional forms for the following relationships (using brms syntax):

Model 1: y ~ x + (1 + x|Group)
Model 2: z ~ x + (1|Group)
Model 3: y ~ z + z^2 + (1 + z|Group)

My objective is to predict y from x; however, I only have very limited paired (x, y) data and need to predict outside the range of the available x values. On the other hand, I have more (x, z) and (z, y) data pairs that cover the required range. What I have been doing so far is using Model 2 to predict z for a range of x, then plugging zhat into Model 3 to get the corresponding range of yhat. This introduces a lot of model error, however (far beyond what would be expected of the true y = f(x) relationship).

I was wondering how to define a brms or Stan model in such a way that incorporates all the models and all the data at once?

Model 1 prediction (y vs. x)

Model 2 prediction (z vs x):

Model 3 prediction (y vs z):

Model 2 → 3 prediction:

Thank you!

ldeschamps · December 13, 2022, 7:18pm

Hi :)

I think the missing value imputation is the way to go

Handle Missing Values with brms • brms (paul-buerkner.github.io)

It would lead to something looking like

bf(y ~ mi(x)) + 
bf(x | mi() ~ z)

I am not sure however if the random effect will accept the mi().

Hope that helps!
Lucas

jsocolar · December 14, 2022, 4:28am

I suggest a bit of caution here, because model 1 says that y should be linear in x, but Models 2 and 3 together imply that y should be quadratic in x. I think your inference will be prone to do weird things if you try to insist on a posterior that incorporates all of these models at once.

If z is noisy or less tightly causally linked to at least one of {x, y} than the causal link between x and y, then it might well be the case that predicting x → z → y yields noisy predictions, and if you need this pathway to inform the location of the x → y relationship over big parts of the domain, then a noisy/uncertain answer might be the right one.

Aorus · December 14, 2022, 1:48pm

Thanks for the advice so far. The causal DAG is z → x and z → y. For some more context, x and y are just two different ways of measuring the value of z, which will be unknown in practice (outside of the experimental data I have here). Basically, we need a calibration equation that predicts y from x so we can adjust historical data for which we only have x and did not measure y.

Topic		Replies	Views
Modeling Multivariate Outcome as a Simplex of Proportions in brms Modeling techniques , specification , cmdstanr , brms	2	191	April 26, 2024
Estimating missing response data in multivariate model brms	4	761	November 24, 2019
Missing data with brms mi() brms missing-data	2	802	June 5, 2020
Include known responses to predict from multivariate model brms	4	512	April 24, 2019
Estimation of covariance over variables in a regression model brms	8	1259	May 4, 2019

Incorporating "latent" variable data into multiple models

Related topics