Poor predictions from MRP model with measurement error

I am trying to create an MRP model which allows for measurement error. My goal is to create a model which allows you to generate useful predictions (and uncertainty bands) of seroprevalence using antibody tests administered to a convenience sample. (To be clear, I am not an epidemiologist and this is not going to be used for any estimates at this stage. It is more out of curiosity.)

Everything seems to be working Ok with the model (in terms of convergence) but the predictions from the model are pretty poor. I am running the model on simulated data using a data generating function very close to that of my model (using the simulate function from the MRP with rstanarm vignette) and the posterior mean for the true prevalance generally does a worse job of predicting the true population parameter than a really crude approach of calculating the weighted mean of the apparent prevalence (with measurement error) and then adjusting the overall estimate for measurement error. In addition, the rank statistic of the true population value in terms of the posterior is usually close to 0 or 1.

I have included the model code below and a quick summary of my model in case anyone has any suggestions or ideas on additional things to try out. Full code here

Quick model summary

I took the Stan MRP model from the user guide and modified it slightly to account for measurement error. Simplifying somewhat, in standard MRP, you assume:

\theta_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I); y_i \sim bern(\theta_{j[i]})

In my model with measurement error, I instead assume:

\theta_j =logit^{-1}(X_j\beta); \beta\sim MVN(0,\sigma I)
pa_j = sensitivity*\theta_j+(1-specificity)*(1-\theta_j)
y_i \sim bern(pa_{j[i]})

Where sensitivity is the true positive rate and specificity is the true negative rate.

Stan code

data {
  int<lower = 0> N;
  int<lower = 0> ng_age;
  int<lower = 0> ng_income;
  int<lower = 0> ng_state;
  int<lower = 0> ng_eth;
  int<lower = 1, upper = 2> sex[N];
  int<lower = 1, upper = ng_age> age[N];
  int<lower = 1, upper = ng_income> income[N];
  int<lower = 1, upper = ng_state> state[N];
  int<lower = 1, upper = ng_eth> eth[N];
  int<lower = 0> y[N];

  int<lower = 0> P[2, ng_age, ng_income, ng_state, ng_eth];
  int<lower=0> tp;  
  int<lower=0> fn;  
  int<lower=0> tn;  
  int<lower=0> fp; 

parameters {
  real alpha;
  real<lower = 0> sigma_age;
  vector<multiplier = sigma_age>[ng_age] beta_age;
  real<lower = 0> sigma_income;
  vector<multiplier = sigma_income>[ng_income] beta_income;
  real<lower = 0> sigma_state;
  vector<multiplier = sigma_state>[ng_state] beta_state;
  real<lower = 0> sigma_eth;
  vector<multiplier = sigma_eth>[ng_eth] beta_eth;
  real epsilon;
  real <lower=0,upper=1> se;
  real <lower=0,upper=1> sp;

transformed parameters {
  vector[N] eps;
  vector <lower=0,upper=1>[N] pt;
  vector <lower=0,upper=1>[N] pa;
  for (i in 1:N) {
    eps[i] = {epsilon, -epsilon}[sex[i]];

  pt = inv_logit(alpha + beta_age[age] + beta_income[income] + beta_state[state] + beta_eth[eth]+eps);  
  pa = se*pt+(1-sp)*(1-pt);

model {
  // priors
  alpha ~ normal(0, 2);
  beta_age ~ normal(0, sigma_age);
  beta_income ~ normal(0, sigma_income);
  beta_state ~ normal(0, sigma_state);
  beta_eth ~ normal(0, sigma_eth);
  sigma_age ~ normal(0, 3);
  sigma_income ~ normal(0, 3);
  sigma_state  ~ normal(0, 3);
  sigma_eth  ~ normal(0, 3);
  epsilon ~ normal(0, 3);
  // likelihood
  // tp ~ binomial(tp+fn, se);
  // tn ~ binomial(tn+fp, sp);
  // y ~ bernoulli(pa);
  target += binomial_lpmf(tp | tp+fn, se);
  target += binomial_lpmf(tn | tn+fp, sp);
  target += bernoulli_lpmf(y | pa);
generated quantities {
  real<lower = 0, upper = 1> phi;
  real eps_i;
  real expect_pos = 0;
  int total = 0;
  for (sex_i in 1:2)
    for (age_i in 1:ng_age)
      for (inc_i in 1:ng_income) {
        for (state_i in 1:ng_state) {
          for (eth_i in 1:ng_eth) {
            total += P[sex_i, age_i, inc_i, state_i, eth_i];
            eps_i= {epsilon, -epsilon}[sex[sex_i]];
            expect_pos += P[sex_i, age_i, inc_i, state_i, eth_i]* inv_logit(alpha + eps_i+ beta_age[age_i] + beta_income[inc_i] + beta_state[state_i] + beta_eth[eth_i]);
            //expect_pos += binomial_rng(P[sex_i, age_i, inc_i, state_i, eth_i], inv_logit(alpha + eps_i+ beta_age[age_i] + beta_income[inc_i] + beta_state[state_i] + beta_eth[eth_i]));
  phi = expect_pos / total;

I don’t really understand those models well, but @lauren is the resident expert on MRP so hope she’s not busy and can look into this…

Martin – Thanks a lot.

@lauren – First, quick update that I wrote up the results from this in a blog post. The post is here.

If you have any suggestions on how to improve the model, I would greatly appreciate it. Somewhat surprisingly (at least for me), using point estimates for sensitivity and specificity rather than modeling those parameters seems to help. Overall, the performance of the estimators is pretty similar so perhaps I just had overly high expectations for the model.

1 Like