New preprint: Biomedical application + Bayesian approach + Uncertainty estimation

frequena · July 6, 2022, 4:13pm

Hello! I wanted to share our new preprint:

CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets

We used CNVscore, a supervised learning model combining gradient boosting with Bayesian logistic regression with a generalized horseshoe prior, for the classification of pathogenic and benign Copy Number Variants (CNVs). Unlike alternative supervised-learning approaches, CNVscore combines a pathogenicity score with an estimate of uncertainty, making it possible to evaluate the suitability of the training set for the query variants.

Pathogenicity CNVscores reached classification performances similar to those of state-of-the-art techniques in comparative benchmark tests across independent sets. Furthermore, CNVscore identified low-uncertainty CNV subsets for which supervised-learning approaches resulted in a higher classification accuracy.

Comments and feedback are more than welcome!

Francisco Requena

yuling · July 6, 2022, 10:03pm

I skimmed your figure 3:

A gradient-boosting model was first trained to classify pathogenic and benign CNVs on 38 genome-wide features. Each of the resulting trees was decoupled into a set of independent decision rules, which were used to annotate CNVs in a binary manner. Such vectors were used as input, to train a Bayesian generalized linear regression model on the same CNV sets. The likelihoods of the model parameters were combined with priors to generate their posterior probability.

Is Stan only used for the logistic regression?
Also, if your features/rules (the input of your logistic regression) are already trained/selected, do you have to worry about the feature-selection bias during your Bayesian uncertainty estimate?

Topic		Replies	Views
Priors set up: Combine horseshoe prior with knowledge about noise in response variable Modeling rstan , biology , priors , regularization , horseshoe-prior	4	688	October 25, 2023
Hierarchical logistic regression on anomalies Modeling	1	632	March 26, 2018
Multilevel Bayesian Models of Categorical Data Annotation brms	3	762	January 30, 2021
Sparsity information and regularization in the horseshoe and other shrinkage priors Modeling	4	1583	July 8, 2017
R*: A robust MCMC convergence diagnostic with uncertainty using gradient-boosted machines Publicity	7	816	November 12, 2020

New preprint: Biomedical application + Bayesian approach + Uncertainty estimation

Related topics