QR Regression Questions

I’m putting together a case study on the QR regression, https://github.com/betanalpha/knitr_case_studies/tree/master/qr_regression, and I had a few questions that I was hoping some people could answer.

  • Who wrote the QR section in the manual? I’m guessing either Ben or Jonah?

  • The manual suggests scaling the Q and R matrices by sqrt(N - 1). This approximately makes Q orthonormal, but for unit scaling don’t we want to scale by the full N? This also seems to be the case empirically as demonstrated in the case study.

  • Any thoughts on what’s causing the correlations in the transformed slopes? I thought it was the weakly informative prior on the nominal slopes, but I can’t seem to recover an isotropic posterior even with a uniform prior on the slopes. This problem should be simple enough that an isotropic posterior is achievable, no?


1 Like

(1) Me
(2) My thinking was that if Q* = Q * sqrt(N- 1) then the correlation matrix of Q* is the identity matrix. So, the units of the coefficients on Q* would be in standard deviations. I don’t think that helps all that much in terms of formulating a prior on the coefficients with respect to Q* or X though. Scaling by N is another option. Not scaling at all seems to be a bad idea for large N.
(3) Did you center both columns of X before decomposing it?

One thing that needs to be said, which I didn’t say in the manual is that under the QR decomposition, the last coefficient on Q or Q* is proportional to the last coefficient on X. So, if you only care / have informative prior information about one coefficient, it should be put last in X and then rescale your prior accordingly.

Also, I have since come to having the hunch that a polar decomposition would be better than a QR decomposition.

1 Like


Not scaling is definitely a bad idea.

I did not center the column of X before decomposing – I guess the QR needs to be done around the centered columns to completely decouple the model?

If you put the intercept into X first and then QR it, that is equivalent to doing QR on the centered X without the intercept.

1 Like

Yup, centering did the trick.


I noticed a small discrpancy between your writeup and the Stan manual

  • In the Stan manual, Q is scaled by sqrt(N-1). This creates a matrix with a standard deviation of 1 for all columns

  • In your writeup, Q is scaled by N. This has a standard deviation very far from 1

Note: Data was centered, but not scaled, before doing the QR decomposition.

See the above discussion – there’s trade off being scaling the variance and the mean of the transformed distribution.