"Regress out" covariates that are indepent to the rest before a full Bayesian inference

Michelle · February 1, 2022, 3:24pm

Hi there,

May I ask a statistical question here? In the linear regression model, y = \beta_{1}x_{1} + \beta_{2}x_{2} \cdots \beta_{n}*x_{n} + \epsilon, we have N covariates x_{n}. If my domain knowledge tells me a subset of covariates x_{n=4 \cdots N,} are independent to the rest x_{n=1, 2, 3} (corresponding \beta_{n=1,2,3} are the parameter of interests). For a point estimation, one can

first “regress out” x_{n=4 \cdots N,} from both y and x_{n=1, 2, 3}, using OLS
then solve y_{reg} = \beta_{1}x_{reg1} + \beta_{2}x_{reg2} + \beta_{3}x_{reg3} + \epsilon.

I wonder if this process is recommended for Bayesian inference? For instance, we do step 1 with OLS, then full Bayesian for step 2. Since the parameters of interests are only \beta_{n=1,2,3}, it would save some computation time to not estimate posteriors of \beta_{n=4 \cdots N}, especially when N is large. I guess the MAP might be the same as if we solve everything once, y = \beta_{1}x_{1} + \beta_{2}x_{2} \cdots \beta_{n}*x_{n} + \epsilon, using full Bayesian. However, I am not sure if the estimated uncertainties of \beta_{n=1,2,3} are the same for these two processes. I guess the uncertainties of \beta_{n=1,2,3} in the 2 step procedure may only referring to the mode of \beta_{n=4 \cdots N}. Can we gain any computation convenience from the fact that x_{n=4 \cdots N,} are independent to x_{n=1, 2, 3}?

Thank you very much,
Michelle

Bob_Carpenter · February 1, 2022, 7:59pm

Of course!

Only if the problem is so large scale that it can’t be tackled with full Bayes. Ideally, we’d marginalize out the nuisance parameters rather than optimizing in that situation a la INLA (an R package for doing this). There’s also a more ad hoc technique called “empirical Bayes”, which often optimizes the marginalized distribution rather than sampling.

MAP isn’t Bayesian inference, per se—it’s a form of penalized maximum likelihood. Full Bayesian inference averages over posterior uncertainty. A typical parameter estimate is the posterior mean, which minimizes expected square error (if the model is well specified) by averaging over posterior uncertainty.

As another example, if we want to predict a new item y' given new predictors x'_{1:n} after observing some data y, x, the posterior predictive density is

\begin{eqnarray*} p(y' | y, x, x') & = & \mathbb{E}\!\left[p(y' \mid x', \beta) \mid x, y\right] \\[4pt] & = & \int p(y' \mid x', \beta) \cdot p(\beta \mid x, y) \, \textrm{d}\beta \\[4pt] & \approx & \frac{1}{M} \sum_{m=1}^M p(y' \mid \beta^{(m)}), \end{eqnarray*}

and we evaluate with MCMC using posterior draws \beta^{(m)} \sim p(\beta \mid x, y). We need the draws for all the parameters together. If we replaced \beta_{4:N} with constants, we’d systematically underestimate uncertainty.

Michelle · February 2, 2022, 2:05pm

Thank you very much @Bob_Carpenter, I guess the short conclusion would be - whether the covariates are independent to each other or not, we always estimate more accurately the uncertainty of the parameters by solving them all together because this allows the uncertainty to “propagate” across parameters.

Could you please elaborate a bit more on “marginalizing out” the nuisance parameters? I understand it like we replace the point estimation OLS in step 1 by a full Bayesian estimation (MCMC or variational), then we average over (marginalize?) the posteriors of nuisance parameters, which results in y^{\prime} and x^{\prime}_{n=1,2,3} – the input of step 2. So this process is sort of using posterior mean of nuisance parameters as constant parameters instead of point estimation, am I right? Then if posteriors of nuisance parameters are close enough to Gaussian, should not this process result in similar y^{\prime} and x^{\prime}_{n=1,2,3} as if we do point estimation?

This example really helps :) Thanks again.

Topic		Replies	Views
Marginalize missing binary outcome variable for GLM Modeling	11	1164	January 31, 2020
Some issues that may arise regarding transformations General	28	1996	November 8, 2018
Mean and variance of Bayesian analysis of multivariate linear mixed models General	8	967	October 30, 2020
For Probabilistic Prediction, Full Bayes is Better than Point Estimators Algorithms	3	5418	April 29, 2019
Bayes factors for a series of experiments using posteriors as priors Modeling bayes-factor	11	1611	March 26, 2021

"Regress out" covariates that are indepent to the rest before a full Bayesian inference

Related topics