This question may be more theoretical in nature. But I am not sure whether I am doing any mistake in my code either.
I have fitted two separate logistic regression models. One model using classical approach and the other one using Bayesian approach. The prediction accuracy was evaluated using 5 fold cross validation.
I used caret
package in R
to fit the model using classical approach. The results are as follows:
Accuracy : 0.6964 Sensitivity : 0.27255 Specificity : 0.91150
To evaluate the model using 5 fold cross validation using Bayesian approach, first I have separated the data into folds using R. So that there are 5 sets of data ( each with 4 training folds and the corresponding test fold). Then the following Stan model fitted separately for the 5 sets of data.
data {
int<lower=1> N1;
int<lower=1> N2;
int<lower=1> K1;
int<lower=0,upper=1> yt[N1]; //response of training data
matrix[N1,K1] x1;//training data matrix
matrix[N2,K1] x1h; // test data matrix
}
parameters {
real alpha1;
vector[K1] beta1;
}
model {
beta1 ~ normal(0, 100);
alpha1 ~ normal(0, 100);
yt ~ bernoulli_logit_glm(x1, alpha1, beta1);
}
generated quantities {
vector[N2] y_new;
p_new = inv_logit(alpha1 + x1h * beta1);//inverse logit transformation to get predictions
}
I created a data frame by merging the above results of the predicted probabilities.
Using that I obtained following measures.
Accuracy : 0.6052275 Sensitivity : 0.3505236 Specificity : 0.7155323
Based on the comparison, the accuracy and specificity measures based on Bayesian model is lower compared to the classical approach. What may be the reason for the difference in the results?
I understand that the results based on the classical approach is based on a certain threshold(0.5 cutoff) and Bayesian results are not. Also the Bayesian model has accounted for uncertainty of estimators as well.
But why is this performance measures are so low based on Bayesian model?
I am not sure I am doing anything wrong.
Any suggestion will be highly useful.