How to measure uncertainty: standard deviation vs. predicted score

Hello everyone! I am a Ph.D. Student developing a Bayesian model for the identification of pathogenic mutations (classification problem) in the context of human genetic diseases. I decided to use a Bayesian model instead of more popular approaches (random forest, xgboost…) because I found very interesting the possibility of measuring the uncertainty for each prediction. In other words, I wanted to create a model that not only reports the prediction score but also how confident the model is about that prediction. This is especially relevant when the model is going to be used in a clinical setting.

Regarding the performance of the model, the accuracy is slightly lower than the one gotten with random forest or xgboost. For the uncertainty quantification, I had calculated both the standard deviation (sd) and the median absolute deviation (mad).

My problem is when I calculate the distance between the predicted score (mean, median, and map) and 0 (score < 0.5) or 1 (score >= 0.5), this new metric is better than the sd (or mad) for the identification of wrong predictions.

Therefore, is it possible to use the predicted score itself as a measure of uncertainty? I have read the predicted score [0-1] of ML models such as random forest, they cannot be read as the probability of the prediction and it might be misleading especially in those cases where the observation is between the features space of the two labels or Out-Of-Distribution (OOD). Does this apply to Bayesian models too? I am not an expert on Bayesian models, may I overlook something? My goal is “simple”: try to provide to the user a score about the confidence of the prediction.

Thank you very much!


Depends on what “predicted score” means here. If you mean the posterior probability of each class given the predictors (i.e. the average of the probabilities computed from each posterior sample) than this involves both the sampling uncertainty and the uncertainty about model parameters and can be interpreted as a measure of uncertainty about the outcome, assuming your model is correct (which is a big assumption).

You can use the loo package to get an estimate of how well your model will predict on out of sample data (which asumes there is no dataset shift between your training data and the data that will be later fed into your model, which is again a big assumption)

Best of luck with your model!