Logistic regression with imperfect measurement of the outcome

Felix_Fischer · September 4, 2019, 10:48pm

Hi,

i want to use Stan to fit a logistic regression which accounts for imperfect measurement of the outcome. The paper describing the Bayesian approach (McInturff 2004) uses discrete latent parameters in bugs. In order to perform marginalization i turned to an earlier paper describing an EM algorithm to estimate this kind of regression (Magder & Hughes, 1997). This paper gives the conditional probabilities, but my model seems to be not correct as i can’t get reasonable regression estimates even when i impose high sensitivity and specificity.

I think my error could be where I sum up the target density, but frankly i am lost. Anyone who spots my misdoing here?

best wishes, felix


    data{
      int n;
      int outcome[n]; //outcome measured with error
      int gender[n]; //a predictor
    }
      
    parameters {
      real<lower=0,upper=1> sensitivity;
      real<lower=0,upper=1> specificity;
    
      real slope;
      real intercept;
    }
    
    transformed parameters {
      real linear_predictor[n];
      real pie[n];
      real likelihood[n];
      
      for (i in 1:n){
        linear_predictor[i] = slope*gender[i]+intercept;
         pie[i] = exp(linear_predictor[i])/(1+exp(linear_predictor[i])); //probability 
      
        //marginalization --> different probabilities for true disease state given observed outcome and linear predictor
        if (outcome[i] == 0){          
          likelihood[i] = log(pie[i]*sensitivity/(pie[i]*sensitivity+pie[i]*(1-specificity)));
        } else if (outcome[i] == 1){
          likelihood[i] = log(pie[i]*(1-sensitivity)/(pie[i]*(1-sensitivity)+pie[i]*specificity));
        }
      }
    }         

model {
      sensitivity ~ beta(200,2); 
      specificity ~ beta(200,2);
      
      intercept ~ normal(0,1);
      slope ~ normal(0,1);
      
      target += log_sum_exp(likelihood);
    }

mike-lawrence · September 5, 2019, 2:34pm

Here’s the model description from McInturff 2004:

So, I think you have the two likelihoods flipped. Plus, possibly the algebra works out the same, but copying directly from McInturff I get:

        if (outcome[i] == 1){          
          likelihood[i] = log( pie[i]*sensitivity + (1-pie[i]) * (1-specificity) );
        } else if (outcome[i] == 0){
          likelihood[i] = log( pie[i]*(1-sensitivity) + (1-pie[i])*specificity );
        }

Felix_Fischer · September 5, 2019, 6:10pm

Thanks for looking into this. Pasting your code unfortunately still does not give reasonable parameters.

What I am calculating there (according to Magder / Hughes) is the probability of true disease state given the observed test result and the predictors (i should have not named that variable likelihood). I think this information is needed to weight the likelihood contributions.

I ultimately want to model is the relation between the predictors and the true disease state and i feel that part is still missing? Should I calculate the likelihood now for both states (true diseases yes/no) and then sum it up weighted by the probabilities?

LucC · September 5, 2019, 6:47pm

I can’t readily follow the math in your likelihood definition. But I do have some remarks that may help.

The expected probability p of observing outcome ==1 is: p = sens x p_true + (1-spec) x (1-p_true). From there it should be straightforward to work out the log-likelihood. Maybe your definition is right, but I don’t understand how you end up with a multiplication of the specificity and true prevalence in your likelihood…

General tip: define the inverse logit transformation as 1 / (1+exp(-(linear predictor))). Thats fewer terms. Not sure if it makes a difference for the autodiff cost though.

LucC · September 5, 2019, 6:52pm

Also, if you work out the likelihood based on the expected probability that I defined above, you have already marginalised out all the possible latent states and dot’t need to do the log_sum_exp thing. I don’t think you needed that in the firste place…

mcol · September 5, 2019, 6:59pm

Wouldn’t be better to use inv_logit()?

mike-lawrence · September 5, 2019, 6:59pm

Good catch; I didn’t look closely enough at the model section. Since likelihood is a vector filled with log-likeilhoods (might consider renaming it to log_lik for clarity), you don’t need to exponentiate then sum then log to get the sum log likelihood, just sum: target += sum(likelihood)

Felix_Fischer · September 5, 2019, 7:12pm

Thanks so much, i think i got it now!

   data{
     int n;
     int outcome[n];
     int gender[n];
   }
     
   parameters {
     real<lower=0,upper=1> sensitivity;
     real<lower=0,upper=1> specificity;
   
     real slope;
     real intercept;
   }
   
   transformed parameters {
     real linear_predictor[n];
     real pie[n];
     real loglik[n];

     
     for (i in 1:n){
       linear_predictor[i] = slope*gender[i]+intercept;
       pie[i] = 1/(1+exp(-(linear_predictor[i])));
     
       if (outcome[i] == 1){          
         loglik[i] = log( pie[i]*sensitivity + (1-pie[i]) * (1-specificity) );
       } else if (outcome[i] == 0){
         loglik[i] = log( pie[i]*(1-sensitivity) + (1-pie[i])*specificity );
       }
}
}

    
   
   model {
     sensitivity ~ beta(200,2);
     specificity ~ beta(200,2);
     
     intercept ~ normal(0,1);
     slope ~ normal(0,1);
     
    target += sum(loglik);
   }

mike-lawrence · September 5, 2019, 7:17pm

Be sure to double check by simulating some fake data and ensuring the simulation parameters are recovered reasonably.

Felix_Fischer · September 5, 2019, 7:19pm

Already on it :D

LucC · September 6, 2019, 7:49am

Sure yes, if that’s an existing function in Stan, that’s probably even more stable and efficient

mcol · September 6, 2019, 10:17am

Yes, one could write

pie[i] = inv_logit(linear_predictor[i]);

I don’t think it’s vectorized, so the loop over elements will have to stay.

Topic		Replies	Views
Alternatives to marginalization Modeling techniques , specification	7	1287	November 5, 2017
Model for binary outcome with misclassification Modeling	1	783	April 5, 2018
Categorical Predictor with Measurement Error Regression Modeling specification	2	412	January 15, 2021
Marginalize missing binary outcome variable for GLM Modeling	11	1081	January 31, 2020
Fitting issues for model with measurement error Modeling	4	76	April 10, 2025

Logistic regression with imperfect measurement of the outcome

Related topics