Logistic Regression without a Gold Standard/Error in Observed Classification

For anyone else who may be interested in this problem, I want to share what ultimately I think was the solution. Looking back over the material I was reading, I realized that the identifiability constraint for accuracy was actually placed on classification errors, not sensitivity and specificity in particular.

To put this more directly, I can borrow from the Evans et al. (1996) notation of the problem:

p_{sick} = P(\eta = 1)
\theta = P(X = 0 | \eta = 1)
\phi = P(X = 1 | \eta = 0)

My constraints would be written as the following: (1 - \theta) + (1 - \phi) > 1, but Evans et al. suggest the constraint as \theta + \phi < 1. With a switch to the constraint on errors being less than 1, I could use a dirichlet prior to identify the model. Since \theta and \phi needed to be less than one, using a simplex[3] variable definition means that I could extract two estimates from the simplex and be guaranteed that they sum to less than one. Using the dirichlet over the simplex, fully resolved all of my estimation problems.

For the sake of time, I still used variational estimation since those models run much faster (<1 second in my simulations versus 14-17 seconds for sampling). No errors were encountered or thrown during the 48,000 simulations I ran. The model has very minimal bias or estimation issues, and it’s overall more accurate than a standard logistic regression. So, seems like these changes worked as desired.

The model is pasted below. If anyone sees areas for improvement, then please still let me know. I also still want to extend the model to multinomial/ordered logistic models instead of just binary ones.

data {

  int<lower=0> N;                          // number of observations
  int<lower=0> P;                          // number of predictors
  array[N] int<lower=0, upper=1> y;        // class observations
  matrix[N, P] x;                          // predictor matrix

}

transformed data {

  vector[P] x_m;                           // predictor means
  matrix[N, P] x_c;                        // centered predictors

  // mean center predictors
  for ( p in 1 : P ) {
    x_m[p] = mean(x[, p]);
    x_c[, p] = x[, p] - x_m[p];
  }

  // perform QR decomposition on centered predictors
  matrix[N, P] b_x_r = qr_thin_Q(x_c) * sqrt(N - 1);
  matrix[P, P] b_x_a = inverse(qr_thin_R(x_c) / sqrt(N - 1));

}

parameters {

  vector[P] t_x;                           // QR decomposed coefficient parameter(s)
  real b_0_c;                              // intercept parameter
  vector<lower=1>[3] theta;                // hyperprior for accuracy
  simplex[3] accuracy;                     // constraint for misclassification error

}

transformed parameters {

  // convert to sensitivity and specificity (assuming model is better than chance)
  real<lower=0, upper=1> sens = 1 - accuracy[1];
  real<lower=0, upper=1> spec = 1 - accuracy[2];

  // linear component of logistic regression
  vector[N] logit_z_hat = b_0_c + x_c * t_x;

}

model {

  // (hyper)priors
  theta ~ chi_square(5);
  accuracy ~ dirichlet(theta);
  b_0_c ~ std_normal();
  t_x   ~ std_normal();

  // likelihood
  for ( n in 1 : N ) {
    target += log_sum_exp(bernoulli_lpmf(    y[n] | sens) + bernoulli_logit_lpmf(1 | logit_z_hat[n]),
                          bernoulli_lpmf(1 - y[n] | spec) + bernoulli_logit_lpmf(0 | logit_z_hat[n]));
  }
}

generated quantities {

  // posterior predictions
  vector<lower=0, upper=1>[N] prob;        // probability of 1
  array[N] int<lower=0, upper=1> pop;      // realization of diagnostic class

  for ( n in 1 : N ) {
    prob[n] = softmax([bernoulli_lpmf(    y[n] | sens) + bernoulli_logit_lpmf(1 | logit_z_hat[n]),
                       bernoulli_lpmf(1 - y[n] | spec) + bernoulli_logit_lpmf(0 | logit_z_hat[n])]')[1];
  }

  // compute class predictions
  pop = bernoulli_rng(prob);

  // compute rescaled coefficients
  vector[P] b_x = b_x_a * t_x;
  real b_0 = b_0_c - dot_product(x_m, b_x);

}
1 Like