Blog article on Bayesian thinking in the use of historical control data

I welcome comments and suggestions on a new blog article showing how Bayesian joint modeling can be used to directly model bias in observational data that are pooled with randomized trial data. I suggest that explicit modeling of bias is a more direct solution than summarizing historical data into a discounting prior. It also allows much more flexibility, e.g., historical data may consist of a series of exchangeable trials (as summarized in a meta-analysis) plus a non-exchangeable trial. Another example: allowance for covariate adjustment. Statistical Thinking - Incorporating Historical Control Data Into an RCT


From the case study abstract:

incorporation of the raw HD (historical data) into the RCT analysis and discounting HD by explicitly modeling bias is perhaps a more direct approach than lowering the effective sample size of HD.

Thanks for sharing this important point in an open case study (+1 for CC-BY licensing). I hope you take this as a compliment, but this looks like the kind of thing @andrewgelman is always telling me to do—model everything instead of using ad hoc adjustments. This works as long as the data can sort out the adjustment, and that’s exactly what @harrelfe is doing in this case study. It’s also what Andrew and I did in our paper on Covid specificity (the link is link to non-paywalled preprint). More specifically, we had validation data for both specificity and sensitivity from reference samples sent to other PCR testing sites. So we modeled all the sites in a hierarchical model and just turned the crank to get a “prior” over the uncertainty in the site for which we analyzed the test data. The simpler alternative would’ve been to squint at the calibration data and use our “expertise” to set a beta prior or some such on the specificity and sensitivity at the site we cared about. By modeling it, we were sure everything was weighted properly by the data and discounted appropriately by the modeled heterogeneity.

Using the approach @harrelfe outlines, rather than dealing with summary statistics like SEs, we just get the full uncertainty (which makes a lot more sense for binary data, for example, where the posterior tends to be skewed for low counts and/or extreme probability estimates). That’s nice in that it’s an in-model way to adjust for all the sample sizes without worrying the prior data will swamp the posterior when there are mismatches. That plays out as increased hierarchical variance, which itself will get a data-driven prior from the multiple sites (but we use full Bayes, not some kind of empirical Bayes approximation). Andrew and I also used post-stratification to deal with covariate mismatch between the sample and the population (though we did not model nonrandom missingness, which is important in this kind of study, too). We didn’t produce pretty simulations like @harrelfe, though.

@andrewgelman has recently been looking into ways of characterizing arbitrary priors in terms of prior sample sizes, so maybe he could also help with formulating the cruder approach.


Bob I take that as a very big compliment. Thank you! Yes this is very much motivated by Andrew Gelman, David Spiegelhalter, and Richard McElreath. The Bayesian implementation in these examples is simple. The only really, really hard part is getting people trained only in traditional statistics to become interested in Bayesian thinking. Thanks for also putting this into a broader context.

One of my motivations was trying to keep up with the literature on discounting priors for incorporating historical data in RCTs. I started to sense that the priors were getting more complex over time, and perhaps too much work and thinking was required while still providing only an approximate analysis due to oversimplifying the historical data.

Thank you again Bob for your kind and informative response!



Yeah, we’re still stuck on this equivalent-sample-size thing. Christian Robert are partway thru a paper on the topic but are kinda stuck. My current plan is to try to write an exploratory or review article with Rob Trangucci on equivalent sample size of the prior.

I’d love to see the result Andrew. I think that being able to compute effective sample size is a useful descriptive tool but not one that is used to select the prior (as is being done in some drug trials with borrowing of historical information). You’ve undoubtedly thought this through better than I but I’ve found that effective sample size needs to be based on posterior probability equivalences for assertions of interest. E.g. Pr(blood pressure reduction > 5mmHg). Sometimes I’ve used equivalence of median posterior probabilities to make things simple when trying to avoid simulation.


Yes, the equivalent sample size of the prior will depend on your quantity of interest. Your prior can have a high equivalent sample size for some quantities and not for others.

1 Like

@harrelfe , the idea with having a parameter describing what you call “bias” (or simply how the external data systematically differs from the new randomized study) makes a lot of sense to me and is my preferred way of approaching these situations. I’ve proposed that approach to borrowing from historical exponential hazard rates for time-to-event data, but with a more peaked and longer-tailed prior (e.g. double-exponential) than the normal distribution you propose. The rationale for the prior choice is as an approximation to model averaging between borrowing from the historical data and ignoring it (which is equivalent to a slab-and-spike prior, this is also in the spirit of this paper). From another perspective, this provides a smoother transition between borrowing/not-borrowing than the robust meta-analytic approach and outperform it in relatively extensive simulations.

Last year, there was another paper with the same idea, where the authors used the horseshoe prior (I had tried that too as well as the regularized horseshoe, but it causes a lot of issues with divergent transitions - not entirely sure how the authors got that under control) and applied it to binomial (logistic regression) and exponential hazard rates. The authors also pointed out the parallel to the thinking in this older Pocock paper.

1 Like

This is exceptionally helpful Björn and is giving me some of the key references I needed to add to the article. I had no idea that Pocock had proposed this and am glad to learn of your time-to-event approach.