Poor predictions from MRP model with measurement error

doug_johnson · May 8, 2020, 12:10pm

I am trying to create an MRP model which allows for measurement error. My goal is to create a model which allows you to generate useful predictions (and uncertainty bands) of seroprevalence using antibody tests administered to a convenience sample. (To be clear, I am not an epidemiologist and this is not going to be used for any estimates at this stage. It is more out of curiosity.)

Everything seems to be working Ok with the model (in terms of convergence) but the predictions from the model are pretty poor. I am running the model on simulated data using a data generating function very close to that of my model (using the simulate function from the MRP with rstanarm vignette) and the posterior mean for the true prevalance generally does a worse job of predicting the true population parameter than a really crude approach of calculating the weighted mean of the apparent prevalence (with measurement error) and then adjusting the overall estimate for measurement error. In addition, the rank statistic of the true population value in terms of the posterior is usually close to 0 or 1.

I have included the model code below and a quick summary of my model in case anyone has any suggestions or ideas on additional things to try out. Full code here

Quick model summary

I took the Stan MRP model from the user guide and modified it slightly to account for measurement error. Simplifying somewhat, in standard MRP, you assume:

\theta_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I); y_i \sim bern(\theta_{j[i]})

In my model with measurement error, I instead assume:

\theta_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I)
pa_j = sensitivity*\theta_j+(1-specificity)*(1-\theta_j)
y_i \sim bern(pa_{j[i]})

Where sensitivity is the true positive rate and specificity is the true negative rate.

Stan code


data {
  int<lower = 0> N;
  
  int<lower = 0> ng_age;
  int<lower = 0> ng_income;
  int<lower = 0> ng_state;
  int<lower = 0> ng_eth;
  
  int<lower = 1, upper = 2> sex[N];
  int<lower = 1, upper = ng_age> age[N];
  int<lower = 1, upper = ng_income> income[N];
  int<lower = 1, upper = ng_state> state[N];
  int<lower = 1, upper = ng_eth> eth[N];
  int<lower = 0> y[N];

  int<lower = 0> P[2, ng_age, ng_income, ng_state, ng_eth];
  
  int<lower=0> tp;  
  int<lower=0> fn;  
  int<lower=0> tn;  
  int<lower=0> fp; 
}

parameters {
  real alpha;
  real<lower = 0> sigma_age;
  vector<multiplier = sigma_age>[ng_age] beta_age;
  real<lower = 0> sigma_income;
  vector<multiplier = sigma_income>[ng_income] beta_income;
  real<lower = 0> sigma_state;
  vector<multiplier = sigma_state>[ng_state] beta_state;
  real<lower = 0> sigma_eth;
  vector<multiplier = sigma_eth>[ng_eth] beta_eth;
  real epsilon;
  
  real <lower=0,upper=1> se;
  real <lower=0,upper=1> sp;
}

transformed parameters {
  vector[N] eps;
  vector <lower=0,upper=1>[N] pt;
  vector <lower=0,upper=1>[N] pa;
  
  for (i in 1:N) {
    eps[i] = {epsilon, -epsilon}[sex[i]];
  }

  pt = inv_logit(alpha + beta_age[age] + beta_income[income] + beta_state[state] + beta_eth[eth]+eps);  
  pa = se*pt+(1-sp)*(1-pt);
}

model {
  // priors
  alpha ~ normal(0, 2);
  beta_age ~ normal(0, sigma_age);
  beta_income ~ normal(0, sigma_income);
  beta_state ~ normal(0, sigma_state);
  beta_eth ~ normal(0, sigma_eth);
  
  sigma_age ~ normal(0, 3);
  sigma_income ~ normal(0, 3);
  sigma_state  ~ normal(0, 3);
  sigma_eth  ~ normal(0, 3);
  epsilon ~ normal(0, 3);
  
  // likelihood
  // tp ~ binomial(tp+fn, se);
  // tn ~ binomial(tn+fp, sp);
  // y ~ bernoulli(pa);
  
  target += binomial_lpmf(tp | tp+fn, se);
  target += binomial_lpmf(tn | tn+fp, sp);
  target += bernoulli_lpmf(y | pa);
}
generated quantities {
  real<lower = 0, upper = 1> phi;
  real eps_i;
  real expect_pos = 0;
  int total = 0;
  for (sex_i in 1:2)
    for (age_i in 1:ng_age)
      for (inc_i in 1:ng_income) {
        for (state_i in 1:ng_state) {
          for (eth_i in 1:ng_eth) {
            total += P[sex_i, age_i, inc_i, state_i, eth_i];
            eps_i= {epsilon, -epsilon}[sex[sex_i]];
            expect_pos += P[sex_i, age_i, inc_i, state_i, eth_i]* inv_logit(alpha + eps_i+ beta_age[age_i] + beta_income[inc_i] + beta_state[state_i] + beta_eth[eth_i]);
            //expect_pos += binomial_rng(P[sex_i, age_i, inc_i, state_i, eth_i], inv_logit(alpha + eps_i+ beta_age[age_i] + beta_income[inc_i] + beta_state[state_i] + beta_eth[eth_i]));
          }
        }
      }
  phi = expect_pos / total;
}

martinmodrak · May 12, 2020, 2:53pm

I don’t really understand those models well, but @lauren is the resident expert on MRP so hope she’s not busy and can look into this…

doug_johnson · May 13, 2020, 5:25am

Martin – Thanks a lot.

@lauren – First, quick update that I wrote up the results from this in a blog post. The post is here.

If you have any suggestions on how to improve the model, I would greatly appreciate it. Somewhat surprisingly (at least for me), using point estimates for sensitivity and specificity rather than modeling those parameters seems to help. Overall, the performance of the estimators is pretty similar so perhaps I just had overly high expectations for the model.

Topic		Replies	Views
Measurement Error Modeling Modeling specification	4	2286	January 9, 2018
Measurement error in nonlinear mixed effect models brms techniques , specification	13	1231	August 30, 2020
Convergence problems / high Rhat for measurement error model Modeling rstan , fitting-issues , performance	12	1490	April 30, 2020
Categorical Predictor with Measurement Error Regression Modeling specification	2	410	January 15, 2021
How to model this measurement error problem best? Modeling techniques , specification	9	780	September 20, 2021

Poor predictions from MRP model with measurement error

Quick model summary

Stan code

Related topics