Hello! I wanted to share our new preprint:
CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets
We used CNVscore, a supervised learning model combining gradient boosting with Bayesian logistic regression with a generalized horseshoe prior, for the classification of pathogenic and benign Copy Number Variants (CNVs). Unlike alternative supervised-learning approaches, CNVscore combines a pathogenicity score with an estimate of uncertainty, making it possible to evaluate the suitability of the training set for the query variants.
Pathogenicity CNVscores reached classification performances similar to those of state-of-the-art techniques in comparative benchmark tests across independent sets. Furthermore, CNVscore identified low-uncertainty CNV subsets for which supervised-learning approaches resulted in a higher classification accuracy.
Comments and feedback are more than welcome!
Francisco Requena