The variance of columns of Q in QR decomposition of linear model in rstanarm

I’m trying to understand this article “Estimating regularized linear models with rstanarm”, but I’m having trouble with the section “Priors”. As background, we are working with the linear model under QR reparameterization:

y = X\beta + \alpha + \epsilon = Q\theta + \alpha +\epsilon,

where \theta = R\beta and \epsilon \sim N(0,\sigma_Y^2).

The vignette says:

To understand this prior, think about the equations that characterize the maximum likelihood solutions before observing the data on X and especially y

then

We can write its k th element as \theta_k=\rho_k \sigma_Y \sqrt{N-1} where ρ_k is the correlation between the k th column of Q and the outcome, σ_Y is the standard deviation of the outcome, and \sqrt{N−1} is the standard deviation of the k th column of Q.

No proof is given for any of these statements and I’m having trouble working them out for myself. I am treating X as a matrix composed of iid random rows and am trying work out the consequent properties of the random QR decomposition.

The matrix \mathbf{Q} has columns that all have a mean of zero and a sum of squares of 1. So, the sample variance is one over N - 1 and the sample standard deviation is the square root of that.

4 Likes

This morning I re-read the vignette and noticed that the predictors are centered, which cleared things up immediately. Thanks for the help!

That is true, although it would be essentially the same with an \mathbf{X} that is uncentered as long as it has a column of ones.