Getting different results for logistic regression model

This question may be more theoretical in nature. But I am not sure whether I am doing any mistake in my code either.

I have fitted two separate logistic regression models. One model using classical approach and the other one using Bayesian approach. The prediction accuracy was evaluated using 5 fold cross validation.

I used caret package in R to fit the model using classical approach. The results are as follows:

         Accuracy : 0.6964   
        Sensitivity : 0.27255         
        Specificity : 0.91150  

To evaluate the model using 5 fold cross validation using Bayesian approach, first I have separated the data into folds using R. So that there are 5 sets of data ( each with 4 training folds and the corresponding test fold). Then the following Stan model fitted separately for the 5 sets of data.

data {
  int<lower=1> N1;
  int<lower=1> N2;
  int<lower=1> K1; 
  int<lower=0,upper=1> yt[N1]; //response of training data
  matrix[N1,K1] x1;//training data matrix
  matrix[N2,K1] x1h; // test data matrix
  
}

parameters {
   real alpha1;
   vector[K1] beta1;
    
}

model {
  
 
  beta1 ~ normal(0, 100);
    alpha1 ~ normal(0, 100);
  
  yt ~ bernoulli_logit_glm(x1, alpha1, beta1);
   
  
}
generated quantities {
  vector[N2] y_new;
  
    p_new = inv_logit(alpha1 + x1h * beta1);//inverse logit transformation to get predictions
} 

I created a data frame by merging the above results of the predicted probabilities.
Using that I obtained following measures.

         Accuracy : 0.6052275   
        Sensitivity : 0.3505236         
        Specificity : 0.7155323 

Based on the comparison, the accuracy and specificity measures based on Bayesian model is lower compared to the classical approach. What may be the reason for the difference in the results?

I understand that the results based on the classical approach is based on a certain threshold(0.5 cutoff) and Bayesian results are not. Also the Bayesian model has accounted for uncertainty of estimators as well.

But why is this performance measures are so low based on Bayesian model?
I am not sure I am doing anything wrong.

Any suggestion will be highly useful.

Just some wild guessing:

1.) Are you using the same splits in both cases?
If not that could explain the difference.

2.) How do you aggregate your CV estimates in the second case? Your posterior distribution also implies a distribution of predicted values and therefore also a distribution over the classification accuracy etc.

In any case, assuming your data is on a reasonable scale and your model has no sampling issues both models should result in almost identical estimates because your prior is super broad.

1 Like

@daniel_h Thank you for your reply.

Let me answer you questions here.

  1. The splits are different. But for classical case, I compared the results based on 5-fold,10-fold and Leave one out cross validation. And the results are relatively same. That is, in all 3 situations the accuracy is close to 0.69, sensitivity is 0.27 and specificity is 0.91, which is different from Bayesian results. So I believe, there seems to be no effect in split.

  2. Lets say my dataset has 100 observations and I ran 5000 MCMC samples in total. That means for each observation, I have 5000 predicted probabilities. So based on CV, I obtained a 100*5000 matrix of predicted probabilities. For each predicted probability, I ran a Bernoulli trial to get the predicted y value. Therefore, I obtained 5000 predicted y values for each observation. Then for each row, I compared the actual y value with the predicted y values and calculated the average correct classification rate. Like wise I got 100 average correct classification rates. Then the overall accuracy is the average of all these 100 values.

You could simplify this process a little by generating the predicted y values in Stan, rather than a separate step in R. You just need to change your generated quantities to:

generated quantities {
  int y_new[N2];
  
    y_new = bernoulli_logit_rng(alpha1 + x1h * beta1);//inverse logit transformation to get predictions
} 
1 Like

@andrjohns Thank you. I didn’t know this before. This will ease my calculation. But Still the results based on classical method and Bayesian method is very different.

I have included a reproducible example with the Rcode and the Stan code with data for 2 fold cross validation:

test_code.R (1.4 KB)

logistic_modelCV.stan (598 Bytes)
diabetes.csv (23.3 KB)

The results are different in this case also.
Please have a look if you have a time.
Thank you.

That doesn’t sound right, you want to calculate classification accuracy across different realizations of your data (by column) and not across rows.

Edit: Sorry I did not think this through. If you are just interested in a point estimate of your classification accuracy then it doesn’t matter what you average, but it does matter if you want a distribution of classification accuracies that are consistent with your data.

The difference you are observing comes from the Bernoulli trial. The results should match if you use the theta > 0.5 decision threshold for the Bayesian version as well.

2 Likes