Shrinkage in meta-regression?


Most of my work these days is running pairwise and mixed treatment meta-analyses, where the unit of analysis for any covariate adjustment is usually quite small (10-40). This means that we usually end up plugging coefficients in one at a time, which is less than ideal.

I’m wondering if anyone here is familiar with applications of ridge/lasso/horseshoe style shrinkage to coefficients in meta-regression, or if perhaps there is something about the meta-analysis paradigm that makes this not make sense? I’ve been looking for a good topic to work out a simulation for, and thought this might be a good one since it fits well with what I do for the bulk of my work. Quick search of google/pubmed didn’t pull out anything that was immediately relevant, so I thought I’d check here to see if the idea is already used in some work that this group is familiar with.



You mean testing each independently then going on to pairs, etc.? Or something like a greedy variable selection?

I’d think you’d use the same principles of Bayesian modeling and inference in a meta-analysis setting as welsehwere.


I’m not sure, but here’s what occurs to me off the top of my head: I think it could make sense in certain cases but I doubt there are too many examples. At least I haven’t seen the hierarchical shrinkage priors like horseshoe used in a meta analysis, but maybe @avehtari has? The idea with these priors is that a few coefficients may be large but the others should be close to zero. Maybe that makes sense in some meta analysis contexts but it’s also possibly a strange assumption (sure, we expect some studies to yield much larger “effect” estimates than others but is it reasonable to assume almost all are zero-ish and a few should be allowed to be quite big? Maybe in some particular cases I suppose). But I’m really not an meta analysis expert so hopefully you’ll get some other resposnes.

This is a bit of a digression away from your actual question, the thing that doesn’t make sense about the meta analysis paradigm is treating uncertainties as known (e.g. sigma in the eight schools example). Often that’s just because the raw data from the studies aren’t available, only the point estimates, but because it has become the standard thing to do in meta analyses I’ve even seen it done when the full data is available!


Standard practice, at least in the reviews I have been involved in, has been to never progress past testing each independently. The goal, more or less, is to reduce unexplained heterogeneity and hopefully make our exchangeability assumption more believable. In some fields/questions this works better than others, but generally we could never dream of throwing in all the variables we think might be important in a single model. From what I can understand from @avehtari’s work, I thought we might have more luck throwing everything we think is important into the model in one step and indentifying the one or two study level variables that might actually give us a more plausible estimate.

Hey Jonah, I wonder if I may have explained myself incorrectly. The goal here wouldn’t be to apply shrinkage directly to the treatment effect, but to coefficients associated with some study level variable (e.g. gestational age, birthweight). I think in my field at least, we often have a handful of study level variables that we think might interact with treatment effect but we don’t reasonably expect them all to have an influence. In any case, the actual focus is only ever on the treatment effect itself, with the meta-regression just being an effort to put all of the studies on equalish ground.



I think I also read through your original post too quickly. That’s a different scenario than what I thought I was commenting on.

If you have lots of variables but only suspect a few will have non trivial estimate sizes then sure you could use a hierarchical shrinkage prior. If you have prior information to suggest which variables you expect to play a larger role than othe then I would definitely also use that information in the prior.


Am I understanding correctly that you’re suggesting using say, a horseshoe prior over each coefficient but then flagging certain coefficients as being more likely to be important based on some additional prior information? We typically have clinicians rank covariates from most to least important based on plausibility/size of interaction with treatment so this would be a real nice way to formalize that if so.


It’s certainly fair game to include that kind of information in your prior if it’s available to you. But if there are some coefficients you really do expect to be much larger than the others, that can also be a reason to give them a different more informative prior than other coefficients.


For this see and references in there. If you find this kind of work interesting, Pedram can help more.

For priors on any level, if your prior information is that some are small and some large then use that kind of prior. If you know that specific ones are large or small then use that kind of prior, too.


Thank you Ahki. Giving this a read now and will contact Pedram if it looks like a good fit. You always seem to be involved in the coolest stuff.


He definitely is!