How are sub-models fitted in Projection predictive inference (projpred)?


I have been using the excellent projpred package to conduct variable selection on fitted models from brms.
I have also read a few papers on projection predictive inference .

Assuming I fit a simple linear regression model with 4 variables (our reference model):
brms(y ~ x1 + x2 + x3 + x4, data=data, family=gaussian())

And we would like to run Projection Predictive Inference (PPI) on this.

From what I have read, PPI (forward search) starts with an intercept-only model, and the reference model is projected onto this. It then proceeds to build all 4 submodels of size 1, and picks the best submodel of size one, which has the minimum KL divergence wrt. the posterior predictive distribution of the reference model. Say it was found to be x_2, we then use the one variable model y ~ x2, and we build all possible size 2 models where x_2 has to be present and so on.

In this model building phase (for all possible size one models, all possible size two models, …), how exactly is it done internally ?

  1. Is a brms object refitted all this times ?
  2. Is the simple lm() formula used in fitting sub models ?
  3. Is it that, we use the reference model, and when fitting sub models of size one using say variable x_2, we set the coefficients of the rest of the variables to 0.

No. 1 & 2 was due to section 4.1 of Robust and efficient projection predictive inference (2023)

[…] Following this, we fit all size-two models including the intercept and 𝑥(1) (“size-two” does not count the intercept here), and once more select the one closest to the reference model in terms of KL divergence of their posterior predictive distributions. Denote this second predictor to be selected 𝑥 (2) . This is repeated until either all predictors are selected, or some pre-defined limit on the model size is reached.

No. 3 was due to the paper I read on page 2 of : Projection Predictive Inference for Generalized Linear and Additive Multilevel Models (2020)

In the context of variable selection, one typically constrains the projection to a smaller subset of variables where the excluded variables have their coefficients fixed at zero. Then, the projection procedure sequentially projects the posterior onto an incremental subspace, until all the variables have entered the projection.


In projpred’s traditional projection (i.e., non-latent and non-augmented-data), the projection is achieved by fitting a given submodel via maximum likelihood to the fitted values of the reference model, see section 3 of Projective inference in high-dimensional problems: Prediction and feature selection. So your No. 2 and 3 are partially correct (I added “partially” here because neither No. 2 nor No. 3 describes the projection in its entirety). No. 1 is not correct.



After reading section 3.4, I do see that projection for exponential family models/GLMs is equivalent to maximum likelihood fitting.

I will have to admit that I have not completely understood (to the point of understanding implementation details) all the three methods (draw-by-draw, single-point methods, and clustering), as highlighted.

When does setting the rest of the coefficients to 0 apply (L1 search ?) ?

Have you seen my talk Use of reference models in variable selection? There is a part illustrating projecting just one point, and projecting the posterior draw-by-draw, and illustration of setting some coefficients to 0.

1 Like

I have not checked it.
Thank you for sharing, I am sure it will be helpful.