Genetic association studies in Stan


Dear Stan community,

I’m looking for papers and/or case studies/tutorials that cover how genetic association studies can be done in a Bayesian framework (in particular focusing on Stan). Can you share some related material?

Many thanks!


So basically are you looking for a logistic regression with sparsity inducing priors or something more complicated? Aki Vehtari (posts here) has a lot on that and how to do model checking with it. Obv. there’s more complicated data out there so it’s hard to give generic advice.


That would already be a good start. If I understand correctly, this would directly work with the whole genotype data instead of “engineered features” like PCAs?

What do you have in mind when you refer to more complicated settings? Multiphenotype scenarios?


Yeah, @avehtari (and others) know the range of scales where these are applicable better than I do. The related key words are things like the horseshoe prior, horseshoe+, and some more recent and sampling-friendly alternatives.

Yeah, multiphenotype, or time-varying phenotype (e.g.-in atlantic salmon restoration the there was a lot of interest in what drives timing of smolting and there are obvious environmental, size/growth queues at play in addition to genetics). It gets really messy with field measurements.


Things which affect scalability in genetic association studies: Do you want to do the imputation, too? If the imputation has been already made, are you using imputation uncertainties? Multilocus? Gene x gene interactions? Gene x environment interactions? Multiphenotypes? Logistic regression or something more complex? Hierarchical model for male/female etc? With Stan it’s easy to write a model with all these, but it’s likely that it doesn’t compute well if you have 1M+ genetic markers and 100k+ subjects, unless you parallelize. With 1M+ markers you would probably need to parallelize over markers, too, but that’s doable, too. For simpler models it is possible to write model specific code which can be faster than Stan, but then it’s not so easy to modify your model.


Thanks. Any (Stan-related) references describing some of the approaches?


This one just came out and is a pretty straightforward regression in Stan.