Understanding measurement error models w fake data (when the error is quadratically related to the measure)

fusaroli · May 26, 2021, 10:58am

I’m trying to understand the impact of measurement error on inferred parameters (reproducible example below).
In a study we are building a multilevel model of the performance of parents, and one of the performance of their children. the two models are separated because the tasks are not exactly the same and we need to build slightly different models.
We then extract the individual level estimates of performance on a log-odds scale (the model has a binomial likelihood), and produce summary stats (mean and sd).

We are interested in how the performance of child and parent relates, so we build a gaussian model where the performance of the child is predicted by that of the parent and add measurement error (the sd of the individual level posterior samples) to outcome and predictors. With only measurement error in the outcome, no issues and we see reasonable coefficients (0.3). With only measurement error in the predictor, no issues and we seem roughly the same coefficients. However, when we include both, things go crazy and the model infers coefficients of about 2 (vs. 0.3 before) and is quite confident about that.

Digging into the data, I can see that a possible issue is that the sd of the posterior samples is quadratically related to the mean (a beautiful parable!). The further away from the population mean, the bigger the uncertainty in the estimate of the participant. This is not due to the scale as it happens also if I convert the samples to probability (0-1) before summarizing them.

So:

what might be causing the strong quadratic correlation between mean and sd? Is that the partial pooling? Further away participants are dragged towards the population mean, but the posterior estimates is stretched by the dragging? the priors on the variance by participant are broad enough that regularization does not seem huge.
how should I deal with that sort of measurement error in the model?

Reproducible example (of the issues emerging from using measurement error):

pacman::p_load(tidyverse, MASS, brms)

# Simulating the input variables 
mu <- rep(0,2)
Sigma <- matrix(.3, nrow=2, ncol=2) + diag(2)*.7
rawvars <- mvrnorm(n=500, mu=mu, Sigma=Sigma)

ParentMean <- rawvars[,1]
ChildMean <- rawvars[,2]

ParentSD <- 0.79 + 0.06*ParentMean + 0.04*(ParentMean^2) + rnorm(length(ParentMean),0, 0.1)
ChildSD <- 0.54 + 0.04*ChildMean + 0.03*(ChildMean ^2) + rnorm(length(ChildMean),0, 0.05)

# Sanity check plots
plot(ParentMean, ChildMean)
plot(ParentMean, ParentSD)
plot(ChildMean, ChildSD)

d <- tibble(
  ParentMean, ParentSD, ChildMean, ChildSD
)


f0 <- bf(ChildMean ~ ParentMean)
f1_out <- bf(ChildMean | se(ChildSD) ~ ParentMean)
f1_pred <- bf(ChildMean ~ me(ParentMean, ParentSD))
f2 <- bf(ChildMean | se(ChildSD) ~ me(ParentMean, ParentSD))

p0 <- c(
  prior(normal(0, 1), class=Intercept),
  prior(normal(0, 1), class=b),
  prior(normal(1, 0.5), class=sigma)
)

p1_out <- c(
  prior(normal(0, 1), class=Intercept),
  prior(normal(0, 1), class=b)
)

p1_pred <- c(
  prior(normal(0, 1), class=Intercept),
  prior(normal(0, 1), class=b),
  prior(normal(1, 0.5), class=sigma)
)

p2 <- c(
  prior(normal(0, 1), class=Intercept),
  prior(normal(0, 1), class=b)
)

m0 <- brm(
  f0,
  d,
  family=gaussian,
  prior=p0,
  sample_prior=T,
  chains = 2,
  cores = 2,
  save_mevars = T
)

m1_out <- brm(
  f1_out,
  d,
  family=gaussian,
  prior=p1_out,
  sample_prior=T,
  chains = 2,
  cores = 2,
  save_mevars = T
)

m1_pred <- brm(
  f1_pred,
  d,
  family=gaussian,
  prior=p1_pred,
  sample_prior=T,
  chains = 2,
  cores = 2,
  save_mevars = T
)

m2 <- brm(
  f2,
  d,
  family=gaussian,
  prior=p2,
  sample_prior=T,
  chains = 2,
  cores = 2,
  save_mevars = T
)

This gives betas for the relation between child and parent of:

m0: 0.29, 0.05
m1_out: 0.25 0.03
m1_pred: 1.67, 0.38
m2: 2.20, 0.27

mike-lawrence · May 26, 2021, 12:54pm

You should be able to do inference on the correlation of interest using all the raw data in a single model (rather than three as you describe), even when the data for the children and parents require different model structures. See here, in particular the “(… | p | id )” syntax which lets you link quantities that you want to correlate.

fusaroli · May 26, 2021, 6:03pm

thanks. I have some modeling in progress on those lines, but as far as I can see to use two different datasets I should go straight to stan right?

mike-lawrence · May 26, 2021, 6:43pm

@paul.buerkner can one use multiple datasets as input to brms to achieve this inference scenario?

paul.buerkner · May 27, 2021, 8:05am

I am not sure if in this was. One could use multiple data subsets via the resp_subset addition term and one could pool multiple data via brm_multiple, but I think none of which is what you want as far as I understand.

paul.buerkner · May 28, 2021, 9:06pm

yes exactly :-)

fusaroli · May 28, 2021, 9:07pm

So, resp_subset() seems to be the solution. But can I combine two elements: | trials() and | resp_subset() in the same formula?

fusaroli · May 28, 2021, 9:15pm

well, apparently I can, now diagnosing to see if that messes up anything

paul.buerkner · May 28, 2021, 9:19pm

yes as y | trials() + subset() ~ …

fusaroli · May 28, 2021, 9:23pm

awesome, thanks!

Topic		Replies	Views
Propagation of Measurement Error on the outcome within mi() Modeling brms	8	125	October 9, 2024
Brms measurement error model where x and y measure same latent variable brms	1	362	August 11, 2023
Measurement error models in brms with error on both independent and dependent variables brms specification	2	2971	February 3, 2021
Propagate posterior uncertainty as measurement error? Modeling hierarchical-model , brms	2	335	March 18, 2024
Predictor including error but allowing to be estimate by a supplementary dataset brms	3	336	April 18, 2023

Understanding measurement error models w fake data (when the error is quadratically related to the measure)

Related topics