Learning from small data?

ignacio · September 7, 2018, 1:41pm

I’m looking for examples, in any field, of papers in which the author learned something using a good model, stan, and a small data set. I’m waving my hands about what small is, but my hope is that people from different fields can point me to different examples. Can you point me in the right direction?

Thanks!

wds15 · September 7, 2018, 2:10pm

Very unclear what you are looking for, but maybe:

https://cran.r-project.org/web/packages/RBesT/vignettes/introduction.html

ignacio · September 7, 2018, 2:26pm

Sorry about that. Basically, I’m trying to find studies where the sample size is less than 100 but you can learn something useful by using the right methods. In the world I live in I hear a lot of “unless you have a huge sample, your study will be underpowered and there is nothing you can learn from a small sample.” My hope is to find examples in which that is not the case.

Thanks. Meta-analysis are the first thing that I thought about, but I’m hoping to find studies with small samples that are not a meta-analysis. For example, studies in fields where collecting an additional data point is really expensive.

wds15 · September 7, 2018, 2:29pm

Uhm… the point of the meta-analyses is to reduce the sample size needs of a future study. Any data point in a new study is expensive (a human is administered a drug). That’s why I pointed this out. Meta-analyses make the difference here to end up with good inferences which combine the existing with the new data.

ignacio · September 7, 2018, 2:36pm

Yes, I’m just hoping to find examples other than meta-analysis.

Thanks!

arya · September 9, 2018, 2:22pm

A lot of the times in medicine collecting another data point is expensive or even impossible. For example, if we’re trying to diagnose Alzheimer’s in a patient, it would be really useful to get as many MRIs over time of the patient’s brain as we can. Unfortunately, MRIs are very expensive to run. In some cases, if a patient has a pacemaker, then we can’t even do an MRI!

So in that case we either need to

pool together information across patients and be explicit in how we’re modeling the pooling relationship
incorporate biological knowledge about the system/data we’re modeling

Stan and Bayesian modeling are really good at doing these things. In my old Stancon notebook I used the first approach.

Another example is in traumatic injury. A lot of the times if a patient comes into the hospital after being injured in a car accident, we may not have time to take a blood sample, because we have to act quickly, or we can’t take samples as often as we like, or we can’t do all the tests we want to do because they’re slow or expensive. I work with trauma data and had another recent StanCon notebook where I used the second approach (link here).

ignacio · September 9, 2018, 2:29pm

Thanks @arya this is exactly the type of research that I was trying to find. In addition to the notebooks, is there any published paper that you can point me to?

arya · September 9, 2018, 2:40pm

Hm not that I can think of off the top of my head, but you can cite Stan case studies.

I’ll also say that for clinical trials data is also expensive to get because running a clinical trial is expensive.

betanalpha · September 25, 2018, 1:11pm

In https://www.ncbi.nlm.nih.gov/pubmed/28376897 and https://elifesciences.org/articles/35213 we built a complex model of malaria propagation through mice and mosquito populations that was fit to around 100 total observations and generated precise inferences about different vaccine efficacies.

Power is important for calibrating discovery claims and more replicate observations means higher power. In practice, however, additional data are not pure replications and instead bring with them systematic differences that must be modeled lest your inferences be biased. But then increasing the model complexity…reduces your power and you end up in this weird loop. Ultimately you have to accept the limitations of your experiment and hone your questions to those that can reasonably be answered with the available data.

Topic		Replies	Views
Publication using Stan for aggregate data meta-analysis Publicity meta-analysis	1	922	August 21, 2017
Sample size planning for experiments General techniques , specification , hierarchical-model , interpret-results , brms	3	605	April 28, 2023
Meta-analysis model for multivariate datasets Modeling meta-analysis	1	452	December 28, 2022
Incorporating standard errors into the model rstanarm	8	1460	September 17, 2017
How can I simulate a greater dataset (brms/Bayesian analysis) Modeling rstan , brms	2	351	December 8, 2023

Learning from small data?

Related topics