I am reading this nice post on QR decomposition on regression.
https://mc-stan.org/users/documentation/case-studies/qr_regression.html
I wonder the reason for the observations under “The Importance of Centering Covariates” – the posterior on \tilde{\beta} is still quite correlated without the centering. Is it about the design matrix [1,Q]?
Following the setup of the original post, I did
set.seed(0)
N <- 5000
x <- rnorm(N, 10, 1)
X <- data.matrix(data.frame(x, x * x))
Q <- qr.Q(qr(X))
design <- cbind(1, Q)
t(design) %*% design
and get
[,1] [,2] [,3]
[1,] 5000.000000 -7.036359e+01 -6.931542e+00
[2,] -70.363587 1.000000e+00 -9.419006e-18
[3,] -6.931542 -9.419006e-18 1.000000e+00
which looks reasonably okay to me. I am curious what would be the root cause for the strong correlated posterior.
Update:
Then I actually tried
cov2cor(solve(t(design) %*% design))
and get
[,1] [,2] [,3]
[1,] 1.0000000 0.9999072 0.9905672
[2,] 0.9999072 1.0000000 0.9904752
[3,] 0.9905672 0.9904752 1.0000000
which does indicate the posterior would strongly correlate. But I don’t think I fully understand the reason. Some discussion will be very helpful.
@bgoodri and @betanalpha were also discussing related to this in QR Regression Questions.