Hi Stan users!
I’m modeling hierarchical tumor size measurements with the structure: measurement over time in tumor in organ in patient in study. I have about 150 000 measurements from 30,000 tumors across 22 studies. I’m trying to optimize runtime using within-chain parallelization via reduce_sum on an HPC cluster (up to ~80 cores per node).
Right now my partial_sum is sliced by study, and inside I loop per study, patient and tumor. For each tumor I extract its measurement segment and compute the longitudinal mean. Something like,
for (ss in 1:nstudy_sub) {
int s = study[ss];
for (j in 1:nb_patient_per_study[s]) {
for (l in 1:nb_lesions_per_patient_per_study[j, s]) {
vector[len] Yobs = Y[start_etude[s]:end_etude[s]]
[start_patient_etude[s,j]:end_patient_etude[s,j]]
[start_lesion[s,j,l]:end_lesion[s,j,l]];
vector[len] Xobs = X[start_etude[s]:end_etude[s]]
[start_patient_etude[s,j]:end_patient_etude[s,j]]
[start_lesion[s,j,l]:end_lesion[s,j,l]];
// ... compute Ypred and likelihood ...
}
}
}
Studies, patients, and tumors are highly imbalanced (not the same size every time). Some studies have many more patients/measurements than others, and within studies some patients have many more tumors and/or measurements than others.
My questions:
- Since I have study-, patient-, and tumor-level parameters (random effects at each level), is there a recommended way to structure the computation so that data slicing is efficient ?
- Is slicing by study still a good strategy for reduce_sum, or should I rewrite to slice at a finer level (measurement levels) to improve load balancing across many cores?
- Any best practices for indexing/extracting tumor-level measurement efficiently in Stan would be very welcome.
Thanks a lot for any guidance!
-S