The Stan manual section on QR reparametrization (Section 9.2, p.124 in the current one 2.16.0) explaines why QR is a good idea for regression analysis and how it can be implemented in Stan.
I had already read that in one of the case studies by Michael Betancourt (link: http://mc-stan.org/users/documentation/case-studies/qr_regression.html) and had implemented it in the model I was playing around with at the time.
What is done in the case study is to have the new parameter beta_transf = Q * beta where beta is the original parameter. The beta is recovered in the transformed parameters block and given a prior in the model block.
What is done in the manual is similar except that here the original beta is recovered in the generated quantities block and no prior is explicitly placed. (and the scaling of Q and R are different but I found information on this in QR Regression Questions)
Am I right to assume that this (the manual) way there is a non-informative prior placed on beta_transf?
If so, couldn’t the prior on beta_transf be way off from a non-informative prior on beta?
Is the only real difference between these approaches (transformed parameter vs generated quantitiy) the parameter I place a prior on or are there additional (computational?) points to look at?
Either way: If I want to place weakly/highly-informative prior on beta, am I right that then I would have to have beta as a transformed parameter to interact with it and not only recover it as a generated quantity?
Not related to this topic, but probably someone who can answer the rest knows this as well: Is the data supplied to stan in the rstan::stan call also included in the resulting stanfit object? (I have not found anything in str(fit) or shinystan, but was unsure, as a lot more is saved than I had thought off)
This is a lot of text for a rather simple question with straightforward answers, I believe. I feel I understood most of it, but am not completely sure and would like to erase this doubt! Thank you for enlightenment!
Improper uniform prior, which is non-non-informative in some ways.
Whatever is your prior on beta_transf, it can be rotated into a prior on beta (and vice versa). It isn’t restricted to improper uniform priors.
Yes
Since R is a constant, the Jacobian adjustment would be a constant and can be ignored when placing a prior on beta. However, it does not make sense to me to do the QR reparameterization and then put independent priors on the elements of beta. If that were reasonable, then you wouldn’t need QR. If you have a weakly informative joint prior on beta, that would be fine.
I am not completely sure on the fourth point (informative priors vs QR). If I were to put independent informative priors on beta (maybe only on some and no (-> uniform) or other on the rest), would the QR decomposition not still make it easier for Stan to move around as Q is orthogonal? Or is this property “destroyed”/less useful because of the different priors on the betas? (If so, why?)
The reason why the QR reparameterization is helpful computationally is because the columns of Q are uncorrelated with each other and also because they have the same sum of squares. So, it is perhaps reasonable to believe — before seeing the data — that the coefficients on Q are also uncorrelated and if they are also normal, then they are independent. Although it is difficult to think about the coefficients on Q, I think independent normal priors with a moderate value for the prior standard deviation will usually be fine.
Conversely, to put independent normal priors on the coefficients on the columns of X, is to say implicitly that you think the columns of X are going to be uncorrelated, which is not the case when the R matrix is not diagonal. If the columns of X were uncorrelated, then you wouldn’t need the QR reparameterization. No one really thinks that the coefficients for X are independent of each other, but no one feels comfortable specifying what they think the dependence structure is. So they just assume independence because everyone else does.
Another consideration is that a strongly informative IID prior on the original space will imply a strong correlated prior on the R-transformed space and the transformed posterior might actually end up worse and harder to fit.