QR and regression with latent covariates (missing values)

For Bayesian (linear) regression, the Stan manual recommends transforming the design matrix using a QR decomposition to aid posterior exploration. This implicitly assumes that the entire design matrix is fixed, so what is the recommendation when doing Bayesian hierarchical imputation of missing values simultaneously? Is it still worth taking the QR even if it cannot be precomputed?

Haha, that is an interesting question.

So the QR reparameterization works by decompositin X \beta into Q R \beta

And then if you say \gamma = R \beta and estimate \gamma directly and the scales of \gamma are nicer so things magically work better.

The issue with what you’re saying is if X has missing values that you’re sampling, then the Q and R would be changing as X change so there wouldn’t be a fixed reparameterization to do (since R is changing every iteration). Does that make sense?

Maybe though, if you just impute values for X using whatever method you like then compute a rough Q_r R_r = X_{\text{imputed}}, then you can use the parameterization \gamma = R_r \beta and just solve Q = X R_r^{-1} on every iteration. Since R_r is upper triangular this might not be unbearable. If you’re doing any centering of variables n’ such you’d have to work that in too.

edit: added text to make imputed X stand out from regular X

1 Like

If X=QR with all quantities being stochastic, then presumably you can still define a random variable \gamma=R\beta, although the density for \gamma would obviously be more complicated than in the fixed X case. So the question really seems to be whether it’s worth doing a QR decomposition every iteration.

I guess your idea is similar to a sort of preconditioning with a ``best guess’’ QR?

Just brainstorming, but maybe one could do the imputation in Q and R instead, using a p(X|Q,R) likelihood to inform them? Maybe overly challenging, especially for more complicated hierarchical models of X.

edit: this paper incidentally writes a Gaussian density on X in terms of independent priors on Q and R.

If some columns of the design matrix are known you can always split those out and precompute the qr. For the more general question it’s hard to tell. For example I can imagine it would be worth it if the imputed values are very poorly identified… but you probably want to avoid that by construction.

A related, but perhaps easier, question: is QR still recommended in the n<p case where the number of parameters dominates the number of observations?

Woops! I forgot to respond. Yeah, the best guess interpretation is what I was going for. The idea was it didn’t seem too hard to engineer and it seems like it’d work okay. Putting distributions on Q and R directly seems hard cause they’re constrained in really weird ways.

I dunno about the second question. If you have MxN A, then R will be MxN. So you don’t have enough equations in \gamma = R \beta to do the solve.