Ok, this might be a bit weird, but here are so many brilliant minds that it’d be stupid not to ask…
Can I use the residuals from another fit/model in my regression?
Suppose I have a Bivariate Normal regression model.
That’s easy to fit in Stan.
Y ~ multi_normal_cholesky(mu, diag_pre_multiply(sigma, L_Omega));
It’s basically the joint distribution of the stuff in Y, e.g. p(y_1,y_2), plus priors (omitted here).
I could also fit p(y_2|y_1)p(y_1)
where \epsilon_i = y_{1,i} - \mu_1, the “residuals” of the “first stage”. One can also easily (jointly) fit this in Stan.
y1 ~ normal(mu[1], sigma[1]);
y2 ~ normal(mu[2], L_Omega[2,1]*(sigma[2]/sigma[1])*(y1 - mu[1]), sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));
And this will yield the same parameters as the code above.
To Frequentists, and especially Economists, the second approach might be more familiar. Fit the “first stage”, put the residuals into the “second stage” regression. Then use , robust
in Stata and all is fine… haha.
But I was wondering if I could do the same with a Stan model. First, a Stan model marginal1.stan
, where I would fit y_1 and compute \epsilon_i in the generated quantities block
...
model{
y1 ~ normal(mu[1], sigma[1]);
...
}
generated quantities{
//divide by sigma to get rid of it in the conditional model...
vector[N] epsilon = (y1 - mu[1]) / sigma[1];
...
}
And then extract the posterior means of epsilon
and put them into a second Stan model conditional2.stan
.
data{
...
vector[N] epsilon;
}
...
model{
y2 ~ normal(mu[2], L_Omega[2,1]*sigma[2]*epsilon, sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));
...
}
...
From a few very basic (!) tests it looks like this works - but it feels really wrong. I feels like this is not “propagating uncertainty”… But then again, sometimes we want to integrate stuff out… I’m just a bit unsure about all of this…
I mean I could also extract the Monte Carlo SE of the means of \epsilon_i and do something like this:
data{
...
vector[N] epsilon_mean;
vector[N] epsilon_se;
}
...
parameters{
...
vector[N] epsilon;
}
model{
epsilon ~ normal(epsilon_mean, epsilon_se); // or NCP
y2 ~ normal(mu[2], L_Omega[2,1]*sigma[2]*epsilon, sqrt(square(sigma[2])*(1-square(L_Omega[2,1])));
...
}
...
Feels a bit more “Bayesian”, but Idk…
Anybody got some thoughts on this?
Cheers!
Max
PS: Obviously the application I need this for is a bit more elaborate and using something like this would make my life so much easier…