Incomplete principal components (PC) regression on p predictors involves fitting only the first q PCs (q < p) against Y to reduce overfitting. Suppose that I replace the design matrix X with a matrix of PCs based on the correlation matrix, and hand that PC matrix to Stan to do regression. Suppose that the coefficient priors are Gaussian with mean 0 and with standard deviations (SDs) that are descending in the component number so that the minor components get penalized severely.
Is there a name for this procedure?
Are there recommendations for the SD formula?
Is there a more efficient way to do this?
One advantage of the procedure is that you can easily transform the posterior draws to the original X space to do inference on the original parameters.
I recently saw Heaps discuss a similar idea in the context of factor analysis. It involved a product of gamma priors, where a new term enters the product for each extra component (which induces the shrinkage). See around Eq 7 here:
Just a note: if you put standard normal priors on all the coefficients, then this implicitly results in a prior on all of the principal components that has the general form that you’re after. I think (but haven’t actually worked out on paper, so I could be making a dumb mistake with my intuition) that if you scale the covariates to have unit variance, the implied priors on the principal components will be Gaussian with standard deviation equal to the square roots of the eigenvalues. This might be a useful starting point to think intuitively about the SD formula. Do you want to penalize minor components more severely than this? It seems like it would be counterintuitive to penalize the minor components any less severely than this.
On the other hand, I think we can definitely answer “is this procedure sensible” in the affirmative–it is at least as sensible as putting moderately regularizing priors on all the coefficients!
Edit: note that I’m unsure whether this is literally the same, or merely quite similar, to what is achieved by passing decomp = "QR" inside a call to brms::brm() with Gaussian coefficient priors.
Thanks both of you for the great observations. Mapping eigenvalues to prior SDs does sound good, with the last PC getting a prior SD that is very small. I think you could let the user specify a constant of proportionality for scaling variance explained by PCs to prior variances.
decomp=QR results in an orthonormal projection but it is not ordered like we need, i.e., not in order of descending variation explained. We need to substitute a singular value decomposition for this other purspose.
I think what I’m trying to say is that both the PCA version with coefficient priors based on the eigenvalues and the QR decomposition are reparameterizations of the original model with Gaussian coefficient priors. Since both yield orthonormal design matrices, it might be the case that in practice parameterizing with decomp="QR" yields all of the same computational advantages. At the end of the day, all 3 are the same model with the same prior, but parameterized differently (I think).
I think that the square root of this constant would turn out to be the standard deviation of the Gaussian coefficient priors in the parameterization given by the original (untransformed) design matrix.