# QR Regression Questions

I’m putting together a case study on the QR regression, https://github.com/betanalpha/knitr_case_studies/tree/master/qr_regression, and I had a few questions that I was hoping some people could answer.

• Who wrote the QR section in the manual? I’m guessing either Ben or Jonah?

• The manual suggests scaling the Q and R matrices by sqrt(N - 1). This approximately makes Q orthonormal, but for unit scaling don’t we want to scale by the full N? This also seems to be the case empirically as demonstrated in the case study.

• Any thoughts on what’s causing the correlations in the transformed slopes? I thought it was the weakly informative prior on the nominal slopes, but I can’t seem to recover an isotropic posterior even with a uniform prior on the slopes. This problem should be simple enough that an isotropic posterior is achievable, no?

Thanks.

1 Like

(1) Me
(2) My thinking was that if `Q* = Q * sqrt(N- 1)` then the correlation matrix of `Q*` is the identity matrix. So, the units of the coefficients on `Q*` would be in standard deviations. I don’t think that helps all that much in terms of formulating a prior on the coefficients with respect to `Q*` or `X` though. Scaling by `N` is another option. Not scaling at all seems to be a bad idea for large `N`.
(3) Did you center both columns of `X` before decomposing it?

One thing that needs to be said, which I didn’t say in the manual is that under the QR decomposition, the last coefficient on `Q` or `Q*` is proportional to the last coefficient on `X`. So, if you only care / have informative prior information about one coefficient, it should be put last in `X` and then rescale your prior accordingly.

Also, I have since come to having the hunch that a polar decomposition would be better than a QR decomposition.

1 Like

Thanks.

Not scaling is definitely a bad idea.

I did not center the column of X before decomposing – I guess the QR needs to be done around the centered columns to completely decouple the model?

If you put the intercept into `X` first and then QR it, that is equivalent to doing QR on the centered `X` without the intercept.

1 Like

Yup, centering did the trick.

Michael,

I noticed a small discrpancy between your writeup and the Stan manual

• In the Stan manual, Q is scaled by sqrt(N-1). This creates a matrix with a standard deviation of 1 for all columns

• In your writeup, Q is scaled by N. This has a standard deviation very far from 1

Note: Data was centered, but not scaled, before doing the QR decomposition.

See the above discussion – there’s trade off being scaling the variance and the mean of the transformed distribution.