Hi everyone,
Many thanks in advance -
I’m looking for advice to improve the speed of my current model.
I have single-cell RNA sequencing data from cancer patients(SAMPLE)with different mutations (FUSION).Each cell has an annotated CELLTYPE.
I have a matrix X that contains gene activity scores across single-cells, and I want to model how different mutations impact gene activities. All the gene activities are centered at zero.
Below is my current model:
mu = alpha + b_fusion + b_celltype
X ~ Normal(mu, sigma)
- I am modeling each patient as a random effect (varying intercepts alpha).
- Then I estimate b_fusion while controlling for cell types.
- There are ~10,000
FEATURES, so I am essentially fitting 10K linear models each with 50K (number of cells) observations.
The current model with data downsampled to 5 features and 2,000 cells take ~5 minutes to fit with num_warmup=500; num_samples=2000. But the entire dataset takes 2 days to draw ~100 samples.
I’ve implemented the following to improve speed:
-
Prior predictive check to make my priors more narrow
-
Non-centered parameterization for alpha
-
Pre compute mu in the transformed parameter block instead of looping over cells during likelihood calculation
But it’s still taking way too long. My final dataset will likely grow another ~5 fold.
I am doing this on the HPC; memory and compute power is not rate limiting.
data {
int<lower=0> NCELL;
int<lower=0> NFEAT;
int<lower=0> NSAMPLE;
int<lower=0> NFUSION;
int<lower=0> NCELLTYPE;
matrix[NFEAT, NCELL] X; //activity matrix
array[NCELL] int<lower=1> SAMPLE;
array[NCELL] int<lower=1> FUSION;
array[NCELL] int<lower=1> CELLTYPE;
}
parameters {
matrix[NFEAT, NFUSION] b_fusion;
matrix[NFEAT, NCELLTYPE] b_celltype;
// Non-centered random effects
matrix[NFEAT, NSAMPLE] alpha_std;
real alpha0; // Mean of the random effects
real<lower=0> sigma_alpha; // SD of the random effects
// residual sd
vector<lower=0>[NFEAT] sigma;
}
transformed parameters {
// Center and scale the random effects
matrix[NFEAT, NSAMPLE] alpha;
for (i in 1:NFEAT) {
alpha[i, ] = alpha0 + alpha_std[i, ] * sigma_alpha;
}
// Pre-calculate the mean matrix for the likelihood
matrix[NFEAT, NCELL] mu;
for (i in 1:NFEAT) {
for (j in 1:NCELL) {
mu[i, j] = alpha[i, SAMPLE[j]] + b_fusion[i, FUSION[j]] + b_celltype[i, CELLTYPE[j]];
}
}
}
model {
alpha0 ~ normal(0, .4);
sigma_alpha ~ lognormal(-1.5, 1);
// Prior for standardized random effects
to_vector(alpha_std) ~ normal(0, 1);
sigma ~ gamma(.1, .1);
to_vector(b_fusion) ~ normal(0, .4);
to_vector(b_celltype) ~ normal(0, .4);
for(i in 1:NFEAT){
X[i, ] ~ normal(mu[i, ], sigma[i]);
}
}