I’m putting together a case study on the QR regression, https://github.com/betanalpha/knitr_case_studies/tree/master/qr_regression, and I had a few questions that I was hoping some people could answer.
Who wrote the QR section in the manual? I’m guessing either Ben or Jonah?
The manual suggests scaling the Q and R matrices by sqrt(N - 1). This approximately makes Q orthonormal, but for unit scaling don’t we want to scale by the full N? This also seems to be the case empirically as demonstrated in the case study.
Any thoughts on what’s causing the correlations in the transformed slopes? I thought it was the weakly informative prior on the nominal slopes, but I can’t seem to recover an isotropic posterior even with a uniform prior on the slopes. This problem should be simple enough that an isotropic posterior is achievable, no?
(2) My thinking was that if
Q* = Q * sqrt(N- 1) then the correlation matrix of
Q* is the identity matrix. So, the units of the coefficients on
Q* would be in standard deviations. I don’t think that helps all that much in terms of formulating a prior on the coefficients with respect to
X though. Scaling by
N is another option. Not scaling at all seems to be a bad idea for large
(3) Did you center both columns of
X before decomposing it?
One thing that needs to be said, which I didn’t say in the manual is that under the QR decomposition, the last coefficient on
Q* is proportional to the last coefficient on
X. So, if you only care / have informative prior information about one coefficient, it should be put last in
X and then rescale your prior accordingly.
Also, I have since come to having the hunch that a polar decomposition would be better than a QR decomposition.
Not scaling is definitely a bad idea.
I did not center the column of X before decomposing – I guess the QR needs to be done around the centered columns to completely decouple the model?
If you put the intercept into
X first and then QR it, that is equivalent to doing QR on the centered
X without the intercept.
Yup, centering did the trick.
I noticed a small discrpancy between your writeup and the Stan manual
In the Stan manual, Q is scaled by sqrt(N-1). This creates a matrix with a standard deviation of 1 for all columns
In your writeup, Q is scaled by N. This has a standard deviation very far from 1
Note: Data was centered, but not scaled, before doing the QR decomposition.
See the above discussion – there’s trade off being scaling the variance and the mean of the transformed distribution.