Hi everyone,

I would highly appreciate feedback for my model. Do you think there are better ways to model this in stan?

We measured our predictor *hc_obs*, which we want to use to predict *y*. *hc_obs* is an attempt to measure *hc_true*, which unfortunately does not work without measurement error. Subjects who have higher *hw* will have reduced reduced *hc_obs* as a technical artifact while when *hw* == 0, there is no measurement error (no person has hw == 0). See the DAG corresponding to our data (grey variables are unobserved):

I thought just including hw in a regression like this will be sufficient:

y \sim bmi + hc_obs + hw

This way the model would answer: Keeping hw and bmi constant, what effect would increasing hc_obs have on y. Therefore, the error is controlled for. I tried to simulate this in R:

```
library(tidyverse)
library(brms)
N <- 1e4
bmi <- rnorm(n = N, mean = 0, sd = 2)
hc <- rnorm(mean = 0, sd = 2, n = N)
hw <- rnorm(n = N, mean = 0, sd = 1)
bHW <- -0.5
bHC <- -1
bBMI <- 0.5
hc_obs <- rnorm(mean = hc + bHW * hw, sd = 1, n = N)
y <- rnorm(n = N, mean = bHC * hc + bBMI * bmi, sd = 1)
tb <- tibble(bmi, hc, hw, hc_obs, y)
m1 <- brm(
y ~ bmi + hc_obs + hw,
data = tb
)
summary(m1)
```

I have an unrealistically high sample size here but it illustrates that this model cannot restore the correct effect estimates in the simulation example. It does not really make a difference whether I include *hw* or not. The estimates are off in a similar way.

How would you model this differently (I would also welcome stan code). Given my DAG, I am thinking of some kind of measurement error model but all my attempts so far did not work out. For example I tried the following stan model:

```
data{
int<lower=1> N_subj;
vector[N_subj] y;
vector[N_subj] bmi;
vector[N_subj] hw;
vector[N_subj] hc_obs;
}
parameters{
real a_y;
real bBMI;
real<lower=0> sigma_y;
real bHC;
real a_hc_obs;
real bHW;
real<lower=0> sigma_hc_obs;
}
model{
vector[N_subj] mu_hc_obs;
sigma_hc_obs ~ exponential(1);
bHW ~ normal(0, 1);
a_hc_obs ~ normal(0, 1);
for (i in 1:N_subj) {
mu_hc_obs[i] = a_hc_obs + hw[i] * bHW;
}
hc_obs ~ normal(mu_hc_obs ,sigma_hc_obs);
vector[N_subj] mu_y;
sigma_y ~ exponential(1);
bBMI ~ normal(0, 1);
a_y ~ normal(0, 1);
for (i in 1:N_subj) {
mu_y[i] = a_y + bmi[i] * bBMI + (hc_obs[i] + bHW * hw[i]) * bHC;
}
y ~ normal(mu_y ,sigma_y);
}
generated quantities{
vector[N_subj] log_lik;
vector[N_subj] mu_y;
for (i in 1:N_subj) {
mu_y[i] = a_y + bmi[i] * bBMI + (hc_obs[i] + bHW * hw[i]) * bHC;
}
for (i in 1:N_subj) log_lik[i] = normal_lpdf(y[i] | mu_y[i], sigma_y);
}
```

That model is similarly off regarding my main predictor of interest. The estimate for bHW is completely off there. I tried also a model where I specificy

```
for (i in 1:N_subj) {
sigma_hc_obs[i] = hw[i] * bHW;
}
hc_obs ~ normal(mu_hc_true, sigma_hc_obs);
```

My attempt here was that for each individual I get an estimate for *hc_true*.

Any help how to model this would be greatly appreciated. Also feel free to point me to educational resources where I can learn to model this myself. In Statistical Rethinking, measurement error was covered but not for my specific case where I actually have variable(s) available that can explain this error.

Thanks,

Carlos