This might be a poorly placed question with no general solution, so I’m looking for general guide and pointers.
Problem
I have a setting where I’m looking to obtain 1,000s of regression coefficients. The data I have naturally lends itself to some kind of hierarchical structure (I’m leaving the specifics unspecified as I’m interested in the general class of problem). However, for each regression, I have 1,000s-10,000s of data points, so some of the typical arguments about the benefits of immediate pooling are not clearly applicable. That is to say, shrinkage should be minimal because the hyperprior shouldn’t be playing a major role in any particular regression coefficient’s posterior values for this much data. Also, fitting a hierarchical model, in this case, is really computationally intense, as I’m in the realm of millions of data points, along with thousands of coefficients. If I put them all into one model, hmc sampling could take arbitrarily long. (If you’re not satisfied that this is large enough to take a long to fit, please just scale the number of data points and regression coefficients in the problem up until you’re satisfied that it would take an uncomfortably long time to fit).
Candidate Solutuion
However, if I forgo a hierarchical model in this case, and just fit the 1,000s of regressions separately, I can parallelize these regressions straightforwardly on a compute cluster, avoid all the standard difficulties of trying to fit a potentially really complicated hierarchical, and have all the posteriors be fit in < hour, whilst potentially waiting days for a hierarchical model on this scale to potentially give me it’s first failed fit with divergences, etc.
Thoughts
Am I missing something in this line of reasoning? What’s wrong with doing this if I’ve got some idea that being hierarchical wouldn’t change the marginal posteriors all that much because I have so much data for each regression? Obviously, I’m not modelling that regression coefficients jointly anymore, but beyond this, what am I really losing? If this approach is valid, there are pragmatic questions, such as how many data points, per regression, should I be looking to have before I can think about throwing away hierarchical models and proceeding carefree with my simple separate models? With my thousands of regression coefficients, I’d like to answer things such as, which regressions likely have the largest effect sizes, and how confident can I be about that as reported by posterior rank probabilities, etc? By running each regression separately, I’m obviously modeling them independently, so the regression coefficients become independent random variables. But presumably in the limits of large n data for each regression, something approximating independence is achieved for the regression coefficients anyway? Are these intuitions right, or am I way off? Any feedback or pointers to existing material related to this would be greatly appreciated.
Thanks,
Aidan