QR decomposition on regression

I am reading this nice post on QR decomposition on regression.

I wonder the reason for the observations under “The Importance of Centering Covariates” – the posterior on \tilde{\beta} is still quite correlated without the centering. Is it about the design matrix [1,Q]?

Following the setup of the original post, I did

N <- 5000
x <- rnorm(N, 10, 1)
X <- data.matrix(data.frame(x, x * x))

Q <- qr.Q(qr(X)) 
design <- cbind(1, Q)

t(design) %*% design

and get

            [,1]          [,2]          [,3]
[1,] 5000.000000 -7.036359e+01 -6.931542e+00
[2,]  -70.363587  1.000000e+00 -9.419006e-18
[3,]   -6.931542 -9.419006e-18  1.000000e+00

which looks reasonably okay to me. I am curious what would be the root cause for the strong correlated posterior.

Then I actually tried

cov2cor(solve(t(design) %*% design))

and get

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.9999072 0.9905672
[2,] 0.9999072 1.0000000 0.9904752
[3,] 0.9905672 0.9904752 1.0000000

which does indicate the posterior would strongly correlate. But I don’t think I fully understand the reason. Some discussion will be very helpful.

@bgoodri and @betanalpha were also discussing related to this in QR Regression Questions.

1 Like

The direct reason would be that Q[, 1] and Q[, 2] still exhibit a strong correlation, (although the dot product is 0). And when the intercept is included after the QR decomposition, coefficients corresponding to Q[, 1] and Q[, 2] will still be weakly identifiable.

1 Like

sorry for not getting to you earlier. It is not clear to me whether your last post means that you managed to resolve the inquiry or if you would still find some more discussion helpful…

Good luck with your journey in stats!