Optimizing Stan Performance for Single-Cell RNA-seq Mixed Effects Model (10K features, 50K cells)

andrewgelman · September 4, 2025, 8:38pm

1. Yes, I often recommend pinning the group-level variance parameter or covariance matrix to a pre-chosen value based on subject-matter information. Often the inference isn’t super-sensitive to this group-level variance, as long as it’s not so small that it causes all the estimates to disappear to zero and not so small that the estimates are wildly noisy.

I’ve toyed with the idea of making this a more formal procedure, for example drawing 10 values of the set of variance parameters from a prior, then using these to run 10 fast inferences (could be MCMC or even just plain old optimization and Laplace approx), then averaging over them using stacking. I think this could work, but I’ve never actually tried it, let alone evaluated the idea. It’s a research idea!

2. Sometimes we do use gamma priors for group-level variance parameters. The gamma prior with 1 or more degrees of freedom has the pleasant property of being zero-avoiding, which is especially helpful when doing marginal maximum likelihood, as we discuss in our 2013 paper: https://sites.stat.columbia.edu/gelman/research/published/chung_etal_Pmetrika2013.pdf or for covariance matrices (using the Wishart, _not_ inverse-Wishart) prior for cov matrix in our 2014 paper: https://sites.stat.columbia.edu/gelman/research/published/chung_cov_matrices.pdf

3. Another thing that’s worked well for me is to use Pathfinder to get starting values. It varies, but sometimes Pathfinder runs very fast and then we can jointly estimate all the parameters and not worry so much about the funnel.

Topic		Replies	Views
How to improve model speed as the number of datapoints increase Modeling fitting-issues , performance	7	466	September 17, 2021
Scalability of bayesian glmm Modeling techniques , performance	7	1399	January 15, 2019
Scaling up a hierarchical model Modeling bioinformatics	28	3255	June 6, 2019
How to speed up my Stan code? Modeling rstan , fitting-issues	8	775	June 14, 2021
How to improve model sampling speed when applied to high-dimension data Modeling fitting-issues , performance , cmdstanr	8	273	August 21, 2025

Optimizing Stan Performance for Single-Cell RNA-seq Mixed Effects Model (10K features, 50K cells)

Related topics