Modelling biological data with few subjects but large # samples per subject


We are modeling a cellular data experiment with Stan and BRMS. This pilot data has <10 subjects with measurements in each of ~3 conditions, but each measurement produces around 10,000 data points which have lognormal distribution. We cannot get the raw data but can get whatever quantiles we want from this distribution.

We would like to take advantage of the presence of these distributions and use them to inform posterior estimates (as opposed to our esteemed colleagues who take the means and run pairwise t tests). So we are running more or less the following standard error model (taking logs of various quantities removed for simplicity):

brm(formula = Mean | se(SE) ~ 1+Condition+(1 | Subject) ...)

Where Mean is the log mean signal from the cells of each Subject under each Condition, and SE is a conservative estimate of SE obtained by taking the larger of the two distances between the log-transformed 5 or 95 quantile and the mean (the distances are usually quite similar), over the number of samples.

I have grown suspicious of our approach for the following reason: Our colleagues have noticed that every pairwise t test they run is “significant” due to the 10K sample sizes. So they have switched to a likelihood ratio test between a model with the hypothesized effect by Condition and a null model, I guess because it strikes them as more believable (they are not statisticians).

With our BRMS standard error approach I am seeing something similar and it makes me skeptical. When the SE is not incorporated, the model has no strong evidence for an effect by Condition. When the SE is added, the HDIs for Condition are very tight and well above zero, so this is purely due to the inclusion of a tight standard error. Has this model sufficiently accounted for having a great many measurements one level, but sparse measurements on another? Is model comparison a better choice here for some reason?


Not an expert here, but I think it could be helpful to look at the distribution of the per-subject intercepts. I would assume that tightening the SE would move the variability to the per-subject intercept - is this the case? And if yes, it might help you interpret the magnitude of the condition effect relative to the between subject variance.

Also, having 10 K samples gives you very precise measurement, but of the value at the moment the measurement was taken (is that cytometry data or something?) Is the measured quantity something you would expect to vary on its own in the absence of any intervention? E.g. even a very precise measurement of blood pressure is not very predictive of blood pressure at a later time as there are sources of variability that are not measurement error (e.g. acute stress). If the underlying quantity could vary in a similar way as blood pressure, it might make sense to add a (1 | Condition * Subject) term with a hierarchical prior to model this additional variability - basically you want to model how different measurements you would get if you ran multiple measurements per subject and condition.