Hello everyone! I am a Ph.D. Student developing a Bayesian model for the identification of pathogenic mutations (classification problem) in the context of human genetic diseases. I decided to use a Bayesian model instead of more popular approaches (random forest, xgboost…) because I found very interesting the possibility of measuring the uncertainty for each prediction. In other words, I wanted to create a model that not only reports the prediction score but also how confident the model is about that prediction. This is especially relevant when the model is going to be used in a clinical setting.
Regarding the performance of the model, the accuracy is slightly lower than the one gotten with random forest or xgboost. For the uncertainty quantification, I had calculated both the standard deviation (sd) and the median absolute deviation (mad).
My problem is when I calculate the distance between the predicted score (mean, median, and map) and 0 (score < 0.5) or 1 (score >= 0.5), this new metric is better than the sd (or mad) for the identification of wrong predictions.
Therefore, is it possible to use the predicted score itself as a measure of uncertainty? I have read the predicted score [0-1] of ML models such as random forest, they cannot be read as the probability of the prediction and it might be misleading especially in those cases where the observation is between the features space of the two labels or Out-Of-Distribution (OOD). Does this apply to Bayesian models too? I am not an expert on Bayesian models, may I overlook something? My goal is “simple”: try to provide to the user a score about the confidence of the prediction.
Thank you very much!