Sampling issues when modeling a proxy variable

swood-ecology · February 18, 2022, 5:44pm

I am trying to model the difference between two variables, y1 and y2:

y1 -y2 = N(\mu, \sigma)

What makes this tricky for me is that both y1 and y2 are measured variables, but the measures are proxies. I don’t have the “true” measurement, but I do have data from a previous study that tells me how the use of that proxy relates to “true” measurement:

proxy = \alpha + \beta * measured + \rho

I wrote a model (below) that treats the “true” measurement as a parameter whose mean is the proxy value and whose standard deviation is \rho from above.

The model compiles and samples but I got several issues related to sampling efficiency, which makes me think I wrote the model wrong. Here’s the model:

data {
  int<lower=0> N;     // length of both y1_obs and y2_obs
  vector[N] y1_obs;
  vector[N] y2_obs;
  // rho from the above model relating proxy measure to a true measure
  real<lower=0> meas_error; 
}

parameters {
  vector[N] y1_true;  // the unobserved true measure of y1
  vector[N] y2_true;  // the unobserved true measure of y2
  real mu;
  real<lower=0> sigma;
}

transformed parameters {
  vector[N] y_diff = y1_true - y2_true;
}

model {
  for (n in 1:N) {
    y1_true[n] ~ normal(y1_obs[n], meas_error);
  }
  for (n in 1:N) {
    y2_true[n] ~ normal(y2_obs[n], meas_error);
  }
  y_diff ~ normal(mu, sigma);
}

These are the errors:

Warning messages:
1: There were 3 chains where the estimated Bayesian Fraction of Missing Information was low. See
https://mc-stan.org/misc/warnings.html#bfmi-low 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is 1.06, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess

And this is the R code I used with some reproducible data:

data = list(
  N = 200,
  meas_error = 2.58,
  y1_obs = rnorm(200,8.88,7.21),
  y2_obs = rnorm(200,10.33,8.32)
)

stan(file = "diff-model-meas-err.stan", 
                data = data,
                iter = 5000,
                warmup = 2000,
                chains = 3,
                control = list(adapt_delta = 0.99,
                                max_treedepth = 15)
)

Does anything stand out as a clear error I’m making in thinking about how to generate a “true observed” variable for which each of the values is a distribution determined by the measured proxy value and rho?

Thank you.

Topic		Replies	Views
"Measurement error" model Modeling	1	186	June 27, 2023
Modeling truncated vs. rounded observations Modeling	2	397	July 26, 2021
Problematic Sampling with Model Modeling fitting-issues , specification	7	628	October 4, 2019
Weak identifiability in measurement models Modeling specification	3	503	January 20, 2020
Sampling from truncated normal distribution Modeling fitting-issues , specification	5	745	October 11, 2021

Sampling issues when modeling a proxy variable

Related Topics