Using residuals from another fit (akin to frequentists 2-stage models)

Ok, this might be a bit weird, but here are so many brilliant minds that it’d be stupid not to ask…

Can I use the residuals from another fit/model in my regression?

Suppose I have a Bivariate Normal regression model.

Y \sim \text{MVN}\left( \begin{bmatrix}\mu_1\\\mu_2\end{bmatrix}, \begin{bmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{bmatrix} \right).

That’s easy to fit in Stan.

Y ~ multi_normal_cholesky(mu, diag_pre_multiply(sigma, L_Omega));

It’s basically the joint distribution of the stuff in Y, e.g. p(y_1,y_2), plus priors (omitted here).

I could also fit p(y_2|y_1)p(y_1)

\begin{align} y_{1,i} &\sim \text{Normal}(\mu_1, \sigma_1) \\ y_{2,i} | y_{1,i} &\sim \text{Normal} \left( \mu_2 + \rho \dfrac{\sigma_2}{\sigma_1}\epsilon_i, \sqrt{\sigma_2^2(1-\rho^2)} \right), \end{align}

where \epsilon_i = y_{1,i} - \mu_1, the “residuals” of the “first stage”. One can also easily (jointly) fit this in Stan.

y1 ~ normal(mu[1], sigma[1]);
y2 ~ normal(mu[2], L_Omega[2,1]*(sigma[2]/sigma[1])*(y1 - mu[1]), sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));

And this will yield the same parameters as the code above.

To Frequentists, and especially Economists, the second approach might be more familiar. Fit the “first stage”, put the residuals into the “second stage” regression. Then use , robust in Stata and all is fine… haha.

But I was wondering if I could do the same with a Stan model. First, a Stan model marginal1.stan, where I would fit y_1 and compute \epsilon_i in the generated quantities block

...
model{
  y1 ~ normal(mu[1], sigma[1]);
  ...
}
generated quantities{
  //divide by sigma to get rid of it in the conditional model...
  vector[N] epsilon = (y1 - mu[1]) / sigma[1];
  ...
}

And then extract the posterior means of epsilon and put them into a second Stan model conditional2.stan.

data{
  ...
  vector[N] epsilon;
}
...
model{
  y2 ~ normal(mu[2], L_Omega[2,1]*sigma[2]*epsilon, sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));
  ...
}
...

From a few very basic (!) tests it looks like this works - but it feels really wrong. I feels like this is not “propagating uncertainty”… But then again, sometimes we want to integrate stuff out… I’m just a bit unsure about all of this…

I mean I could also extract the Monte Carlo SE of the means of \epsilon_i and do something like this:

data{
  ...
  vector[N] epsilon_mean;
  vector[N] epsilon_se;
}
...
parameters{
  ...
  vector[N] epsilon;
}
model{
  epsilon ~ normal(epsilon_mean, epsilon_se); // or NCP
  y2 ~ normal(mu[2], L_Omega[2,1]*sigma[2]*epsilon, sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));
  ...
}
...

Feels a bit more “Bayesian”, but Idk…

Anybody got some thoughts on this?

Cheers!
Max

PS: Obviously the application I need this for is a bit more elaborate and using something like this would make my life so much easier…

2 Likes

In general, you will have to account for uncertainty in some way. The typical frequentist 2-stage tests like 2SLS do this as well. There is a good explanation of this in Gelman and Hill edition 2.

My colleagues in accounting are slowly realizing these type of problems in their frequentist residuals models.

You can get biased estimates in the econometric sense if the residuals are the dependent in the next stage. (Frequentist)
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2597429
You get different results once you propagate uncertainty (Bayesian). If I remember correctly, without propagation, outliers become very influential.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3417406

For your specific, problem I think integrating stuff out might be the best way to simplify the model but I fear you will have to do it analytically. If you have normal noise and linear relations, this is feasible. Otherwise, all bets are off.

Sorry to be of not much help.

2 Likes

How about this?

model{
  y ~ normal(mu[1], sigma[1]);
 (y - mu[1]) ~ normal( mu[2] , sigma[2]) ;
}

Where presumably mu[1] is constructed to reflect the effect of some set of predictors and mu[2] is the effect of some second set of predictors on the residuals of the first set of relationships.