How to model this measurement error problem best?

Hi everyone,

I would highly appreciate feedback for my model. Do you think there are better ways to model this in stan?

We measured our predictor hc_obs, which we want to use to predict y. hc_obs is an attempt to measure hc_true, which unfortunately does not work without measurement error. Subjects who have higher hw will have reduced reduced hc_obs as a technical artifact while when hw == 0, there is no measurement error (no person has hw == 0). See the DAG corresponding to our data (grey variables are unobserved):

Screenshot 2021-05-05 at 20.18.08

I thought just including hw in a regression like this will be sufficient:

y \sim bmi + hc_obs + hw

This way the model would answer: Keeping hw and bmi constant, what effect would increasing hc_obs have on y. Therefore, the error is controlled for. I tried to simulate this in R:

library(tidyverse)
library(brms)

N <- 1e4
bmi <- rnorm(n = N, mean = 0, sd = 2)
hc <- rnorm(mean = 0, sd = 2, n = N)
hw <- rnorm(n = N, mean = 0, sd = 1)

bHW <- -0.5
bHC <- -1
bBMI <- 0.5

hc_obs <- rnorm(mean = hc + bHW * hw, sd = 1, n = N)
y <- rnorm(n = N, mean = bHC * hc + bBMI * bmi, sd = 1)
tb <- tibble(bmi, hc, hw, hc_obs, y)

m1 <- brm(
  y ~ bmi + hc_obs + hw,
  data = tb
)
summary(m1)

I have an unrealistically high sample size here but it illustrates that this model cannot restore the correct effect estimates in the simulation example. It does not really make a difference whether I include hw or not. The estimates are off in a similar way.

How would you model this differently (I would also welcome stan code). Given my DAG, I am thinking of some kind of measurement error model but all my attempts so far did not work out. For example I tried the following stan model:

data{
    int<lower=1> N_subj;
    vector[N_subj] y;
    vector[N_subj] bmi;
    
    vector[N_subj] hw;
    vector[N_subj] hc_obs;
}
parameters{
    real a_y;
    real bBMI;
    real<lower=0> sigma_y;
    real bHC;
    
    real a_hc_obs;
    real bHW;
    real<lower=0> sigma_hc_obs;
}
model{
    vector[N_subj] mu_hc_obs;
    sigma_hc_obs ~ exponential(1);
    bHW ~ normal(0, 1);
    a_hc_obs ~ normal(0, 1);
    for (i in 1:N_subj) {
        mu_hc_obs[i] = a_hc_obs + hw[i] * bHW;
    }
    hc_obs ~ normal(mu_hc_obs ,sigma_hc_obs);
  
    vector[N_subj] mu_y;
    sigma_y ~ exponential(1);
    bBMI ~ normal(0, 1);
    a_y ~ normal(0, 1);
    for (i in 1:N_subj) {
        mu_y[i] = a_y + bmi[i] * bBMI + (hc_obs[i] + bHW * hw[i]) * bHC;
    }
    y ~ normal(mu_y ,sigma_y);
}
generated quantities{
    vector[N_subj] log_lik;
    vector[N_subj] mu_y;
    for (i in 1:N_subj) {
        mu_y[i] = a_y + bmi[i] * bBMI + (hc_obs[i] + bHW * hw[i]) * bHC;
    }
    for (i in 1:N_subj) log_lik[i] = normal_lpdf(y[i] | mu_y[i], sigma_y);
}

That model is similarly off regarding my main predictor of interest. The estimate for bHW is completely off there. I tried also a model where I specificy

    for (i in 1:N_subj) {
        sigma_hc_obs[i] = hw[i] * bHW;
    }
    
    hc_obs ~ normal(mu_hc_true, sigma_hc_obs);

My attempt here was that for each individual I get an estimate for hc_true.

Any help how to model this would be greatly appreciated. Also feel free to point me to educational resources where I can learn to model this myself. In Statistical Rethinking, measurement error was covered but not for my specific case where I actually have variable(s) available that can explain this error.

Thanks,
Carlos

Hi Carlos,

The Stan User Guide has some examples of Bayesian measurement error models (6.1 Bayesian measurement error model | Stan User’s Guide). The references might also be helpful for more in depth reading. In general, to answer your question it would be useful to know a bit more about the relationship between hc_{obs} and hw. In particular, how is the measurement error introduced – e.g. instrument resolution, noisy proxies, etc., and why does a higher hw always result in smaller observed value?

1 Like

Hi @js592,

thanks for your response. I am right now reading the reference manual regarding measurement error models. To answer your question:

hc I used as the abbreviation for hair cortisol. So, the cortisol content has been measured in hair. Frequent hair washing decreases the hair cortisol content. Thus, a person who washes hair more frequently, will have lower cortisol in the hair than was actually present naturally. The hair cortisol reflects cortisol exposure in the body for several months. Hair washing behavior was accessed as the average number of washings per week in the last three months.

If you regress measured hair cortisol on hairwashing, then you find a negative slope with high confidence (whole posterior is negative). So, that relationship is negative on average and in theory hairwashing can not increase but only decrease hc, or leave it unaffected (e.g. possibly if washed without shampoo).

In this post here, I only mention hw to keep it simple. There are more such variables that can explain variation in observed hair cortisol such as sunlight exposure and hair coloring. This variance they explain can be interpreted as measurement error as we try to measure hair cortisol as it would be without the influences of these. My idea was I must be able to correct for these environmental factors. While hair washing is the one we (also based on previous literature) we are most confident in, we cannot be sure that the other factors have an effect on it (even though some literature suggests that).

Let me know if you need more information!

Thank you for thinking along!
Carlos

Generally with measurement error models, you either need multiple variables from which to infer the ‘true’ variable (like factor analysis) or you need a single variable and some quantification of measurement error (like in meta-analysis).

As far as I know, you can’t have a single variable and estimate the measurement error (or ‘true’ score), because there isn’t enough information to distinguish ‘true’ variance from error variance.

1 Like

@andrjohns : I would be surprised if what I have in mind is not possible. Because I think that the model I have in mind requires not much less information than the simplest measurement error model in the stan manual. The measured variable is a function of the true variable + the influence of environmental factors that we actually measured.

It must be something like this:

data {
  int<lower=0> N;       // number of cases
  vector[N] y;          // outcome (variable)
  vector[N] hc_obs;     // measurement of hc
  vector[N] hw;     // measurement of covariate of measured hc
}
parameters {
  real alpha;           // intercept
  real beta;            // slope
  real<lower=0> sigma;  // outcome noise
  vector[N] hc;          // unknown true value
  real mu_hc;          // prior location
  real<lower=0> sigma_hc;       // prior scale
  real bHW;            // slope of covariate of measured hc
  real<lower=0> tau;   // noise of measured hc
}
model {
  hc ~ normal(mu_hc, sigma_hc);  // prior
  hc_obs ~ normal(hc + bHW * hw, tau);    // measurement model
  y ~ normal(alpha + beta * hc, sigma);
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  sigma ~ exponential(1);
  mu_hc ~ normal(0, 1);
  sigma_hc ~ exponential(1);
  tau ~ exponential(1);
  bHW ~ normal(0, 1);
}

Above model works very well so far in restoring all parameters of the simulation. However, I get relatively often scale parameter warnings in the stan output:

Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: normal_lpdf: Scale parameter is 0, but must be positive!
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

I would highly appreciate what you guys think of this model. I would need to add more predictors as my real example is a bit more complicated. But with high sample size, the simulation parameters are perfectly restored.

EDIT:
Trying it with lower sample sizes that are more realistic for our data (50-100), the models performs still good but now I get issues with many divergent transitions that I cannot fix only by adjusting adapt_delta. I hope I will be able to fix this by setting narrower priors…

Oh I see what you mean, this isn’t really a measurement error model in the sense that I’m used to. What you’re doing entering hw as a predictor of hc_{obs} and using the residual from that regression as a predictor of y. So it’s not that you’re estimating the measurement error of hc_{obs}, it’s that you’re estimating the ‘portion’ of hc_{obs} that can’t be explained by hw.

Be aware that this isn’t necessarily the ‘true’ value of hc, it’s just the measured response of hc that’s distinct from hw (and whichever other covariates you include).

As for your model, something that might improve divergences is the use of a non-centered parameterisation. There’s more background in this section of the User’s Manual. You can do this simply using the offset and multiplier syntax. Additionally, you can use std_normal() instead of normal(0,1):

data {
  int<lower=0> N;       // number of cases
  vector[N] y;          // outcome (variable)
  vector[N] hc_obs;     // measurement of hc
  vector[N] hw;     // measurement of covariate of measured hc
}
parameters {
  real alpha;           // intercept
  real beta;            // slope
  real<lower=0> sigma;  // outcome noise
  real mu_hc;          // prior location
  real<lower=0> sigma_hc;       // prior scale
  real bHW;            // slope of covariate of measured hc
  real<lower=0> tau;   // noise of measured hc
  vector<offset = mu_hc,
         multiplier = sigma_hc>[N] hc;          // unknown true value
}
model {
  hc ~ normal(mu_hc, sigma_hc);  // prior
  hc_obs ~ normal(hc + bHW * hw, tau);    // measurement model
  y ~ normal(alpha + beta * hc, sigma);
  alpha ~ std_normal();
  beta ~ std_normal();
  sigma ~ exponential(1);
  mu_hc ~ std_normal();
  sigma_hc ~ exponential(1);
  tau ~ exponential(1);
  bHW ~ std_normal();
}
1 Like

Thanks @andrjohns. Yes that is what I meant. I was already suspecting that I was using inappropriate terminology here. I remember that McElreath wrote in his book that using residuals as predictors in another regression is not good practice. But I think he wrote about the classical approach where residuals are then treated as known values rather than parameters as here in the Bayesian approach. So I think that should be fine.

I never could wrap my head around the non-centered reparamerization but I will give it another read. The std_normal() is equivalent to normal(0, 1) I guess? So I only have to go to normal once I change the SD parameter? Is using std_normal() also expected to improve divergent transitions?

Using your model, I need to increase max_treedepth, but then divergent transitions are indeed lower (now only 5 out of 4000).

Thanks a lot. This is all very helpful.

Hi @andrjohns and everyone else,

I tried to fit this model now to the real data and I run into issues when it comes to estimating the slope parameter (above denoted as beta but from now on I call it b_hc_y to make clear what exactly that beta refers to. What happens is that the 4 chains do not align in how they estimate that parameter and also any single chain does not a good job. The shape (I think) should be gaussian centered around 0.5.

But instead I get something like this:

I attached a reproducible example with R and Stan using cmdstanr. I tried the non-reparameterized stan model and the reparameterized one. I tried playing with max_treedepth and adapt_delta and it does change things but solve the issue.

Here would be a reproducible example in R:

library(tidyverse)
library(cmdstanr)

set.seed(1)
N <- 1000
b_hc_y <- -0.5
b_hw_hcobs <- -0.5

hc <- rnorm(mean = 0, sd = 2, n = N)
hw <- rnorm(n = N, mean = 0, sd = 1)
hc_obs <- rnorm(mean = hc + b_hw_hcobs * hw, sd = 1, n = N)
y <- rnorm(n = N, mean = b_hc_y * hc, sd = 1)
tb <- tibble(hc, hw, hc_obs, y)


file <- here::here("stan/m1_y_hc_test.stan")
m1_y_hc_test <- cmdstan_model(file)

dat <- list(
  hc_obs = tb[["hc_obs"]],
  hw = tb[["hw"]],
  y = tb[["y"]],
  N = dim(tb)[1]
)
fit <- m1_y_hc_test$sample(
  data = dat,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 500,
  #adapt_delta = 0.9,
  #max_treedepth = 20
)

# show the posterior distribution of b_hc_y for each chain 
temp <- fit$draws() %>% as.data.frame() %>%      
  select(contains("b_hc_y")) %>%
  pivot_longer(everything(), names_to = "parameter")

ggplot(temp, aes(parameter, value)) +
    geom_violin(draw_quantiles = c(0.025, 0.9725)) +
    geom_hline(aes(yintercept = 0), linetype = "dashed") +
    coord_flip() +
    theme_bw(base_size = 30)

# show how well true hc correlates with estimated hc  
temp <- fit$summary() %>% as.data.frame() %>%      
  filter(str_detect(variable, "hc\\[\\d+\\]")) %>%
  add_column(hc_true = hc)

ggplot(temp, aes(hc_true, median)) +
    geom_errorbar(aes(ymin = q5, ymax = q95)) +
    geom_point(color = "red") 

However, the model seems to work well when it comes to estimating the “true hc”. Here is a plot that shows correlation between the simulated hc and the estimated hc with 95% CI:

The corresponding stan code:

data {
  int<lower=0> N;       // number of cases
  vector[N] y;          // outcome variable 
  vector[N] hc_obs;     // measurement of hc
  vector[N] hw;     // measurement of covariate that makes hc_obs deviate from hc
}
parameters {
  real a_y;           // intercept 
  real b_hc_y;            // slope
  real<lower=0> sigma_y;  // outcome noise
  real mu_hc;          // prior location
  real<lower=0> sigma_hc;       // prior scale
  real b_hw_hcobs;            // slope of covariate of measured hc
  real<lower=0> sigma_hc_obs;   // noise of measured hc
  vector<offset = mu_hc,multiplier = sigma_hc>[N] hc;          // unknown true value
}
model {
  hc ~ normal(mu_hc, sigma_hc);  // prior
  hc_obs ~ normal(hc + b_hw_hcobs * hw, sigma_hc_obs);    // estimate hc
  y ~ normal(a_y + b_hc_y * hc, sigma_y);
  a_y ~ std_normal();
  b_hc_y ~ std_normal();
  sigma_y ~ exponential(1);
  mu_hc ~ std_normal();
  sigma_hc ~ exponential(1);
  sigma_hc_obs ~ exponential(1);
  b_hw_hcobs ~ std_normal();
}

Any help would be greatly appreciated!

1 Like

Just to make sure people can find this post. Hopefully someone has an idea and can help.

One more attempt in the hope that someone will help me out here. Please also let me know if you need more information or if something is unclear.