Taking a design matrix X and QR-orthonormalizing it is a standard and highly effective technique for improving sampling in the a presence of collinearities among predictor variables. For the problem I’m about to describe my current solution for isolating a prior for a certain parameter is to hold a column of X out of the QR process, but for most parameters I specify priors through contrasts, e.g., I put a prior of a treatment effect at age=50 when age interacts with treatment. I’d like to have a more general solution that allows QR to be used throughout the columns of X.
This question is focused on models for which you might say that a subset of the Xs interacts with Y, in the sense of a partial proportional odds models that allows some of the Xs to not act just in proportional odds, or a Cox proportional hazards model time-dependent covariates that allows an X to not act in proportional hazards.
For example a partial proportional odds model when there are two predictors and an ordinal response Y has levels 0, 1, 2 may be stated as follows, where X_2 is not assumed to be in proportional odds:
\Pr(Y \geq j | X) = \text{expit}(\alpha_{j} + \beta_{1}X_{1} + \beta_{2}X_{2} + \tau[j=2]X_{2}),
where j \in {1,2}, [] is a 0/1 indicator, and expit is the inverse logit transform.
Just as with the Cox model with a time-dependent covariate the likelihood needs to be carefully constructed when dealing with X \times Y “interactions”.
This raises the question of how to use QR pre-processing. If X_1 and X_2 do not have a correlation of exactly 0.0, the QR-transformed columns of X will convert a one parameter departure from proportional odds into a 2-parameter partial proportional odds component. Perhaps when doing the inverse transforming to get back to the original X space will make a \tau-like effect have parameter again, but I can’t wrap my head around this.
Is it the case that in such models that we need to separately run QR on the regular X components and on the components that involve Y?
The general issue of having Y-dependent covariate effects needs to ultimately be addressed in Stan front-ends, and I’d also love to get @paul.buerkner ‘s take on this.