I’m looking for examples, in any field, of papers in which the author learned something using a good model, stan, and a small data set. I’m waving my hands about what small is, but my hope is that people from different fields can point me to different examples. Can you point me in the right direction?
Sorry about that. Basically, I’m trying to find studies where the sample size is less than 100 but you can learn something useful by using the right methods. In the world I live in I hear a lot of “unless you have a huge sample, your study will be underpowered and there is nothing you can learn from a small sample.” My hope is to find examples in which that is not the case.
Thanks. Meta-analysis are the first thing that I thought about, but I’m hoping to find studies with small samples that are not a meta-analysis. For example, studies in fields where collecting an additional data point is really expensive.
Uhm… the point of the meta-analyses is to reduce the sample size needs of a future study. Any data point in a new study is expensive (a human is administered a drug). That’s why I pointed this out. Meta-analyses make the difference here to end up with good inferences which combine the existing with the new data.
A lot of the times in medicine collecting another data point is expensive or even impossible. For example, if we’re trying to diagnose Alzheimer’s in a patient, it would be really useful to get as many MRIs over time of the patient’s brain as we can. Unfortunately, MRIs are very expensive to run. In some cases, if a patient has a pacemaker, then we can’t even do an MRI!
So in that case we either need to
pool together information across patients and be explicit in how we’re modeling the pooling relationship
incorporate biological knowledge about the system/data we’re modeling
Another example is in traumatic injury. A lot of the times if a patient comes into the hospital after being injured in a car accident, we may not have time to take a blood sample, because we have to act quickly, or we can’t take samples as often as we like, or we can’t do all the tests we want to do because they’re slow or expensive. I work with trauma data and had another recent StanCon notebook where I used the second approach (link here).
Thanks @arya this is exactly the type of research that I was trying to find. In addition to the notebooks, is there any published paper that you can point me to?
Power is important for calibrating discovery claims and more replicate observations means higher power. In practice, however, additional data are not pure replications and instead bring with them systematic differences that must be modeled lest your inferences be biased. But then increasing the model complexity…reduces your power and you end up in this weird loop. Ultimately you have to accept the limitations of your experiment and hone your questions to those that can reasonably be answered with the available data.