How are sub-models fitted in Projection predictive inference (projpred)?

sayianka · March 16, 2024, 5:32am

I have been using the excellent projpred package to conduct variable selection on fitted models from brms.
I have also read a few papers on projection predictive inference .

Assuming I fit a simple linear regression model with 4 variables (our reference model):
brms(y ~ x1 + x2 + x3 + x4, data=data, family=gaussian())

And we would like to run Projection Predictive Inference (PPI) on this.

From what I have read, PPI (forward search) starts with an intercept-only model, and the reference model is projected onto this. It then proceeds to build all 4 submodels of size 1, and picks the best submodel of size one, which has the minimum KL divergence wrt. the posterior predictive distribution of the reference model. Say it was found to be x_2, we then use the one variable model y ~ x2, and we build all possible size 2 models where x_2 has to be present and so on.

In this model building phase (for all possible size one models, all possible size two models, …), how exactly is it done internally ?

Is a brms object refitted all this times ?
Is the simple lm() formula used in fitting sub models ?
Is it that, we use the reference model, and when fitting sub models of size one using say variable x_2, we set the coefficients of the rest of the variables to 0.

No. 1 & 2 was due to section 4.1 of Robust and efficient projection predictive inference (2023)

[…] Following this, we fit all size-two models including the intercept and 𝑥(1) (“size-two” does not count the intercept here), and once more select the one closest to the reference model in terms of KL divergence of their posterior predictive distributions. Denote this second predictor to be selected 𝑥 (2) . This is repeated until either all predictors are selected, or some pre-defined limit on the model size is reached.

No. 3 was due to the paper I read on page 2 of : Projection Predictive Inference for Generalized Linear and Additive Multilevel Models (2020)

In the context of variable selection, one typically constrains the projection to a smaller subset of variables where the excluded variables have their coefficients fixed at zero. Then, the projection procedure sequentially projects the posterior onto an incremental subspace, until all the variables have entered the projection.

Thanks,

fweber144 · March 17, 2024, 8:54pm

In projpred’s traditional projection (i.e., non-latent and non-augmented-data), the projection is achieved by fitting a given submodel via maximum likelihood to the fitted values of the reference model, see section 3 of Projective inference in high-dimensional problems: Prediction and feature selection. So your No. 2 and 3 are partially correct (I added “partially” here because neither No. 2 nor No. 3 describes the projection in its entirety). No. 1 is not correct.

sayianka · March 18, 2024, 5:17am

Thanks.

After reading section 3.4, I do see that projection for exponential family models/GLMs is equivalent to maximum likelihood fitting.

I will have to admit that I have not completely understood (to the point of understanding implementation details) all the three methods (draw-by-draw, single-point methods, and clustering), as highlighted.

When does setting the rest of the coefficients to 0 apply (L1 search ?) ?

avehtari · March 18, 2024, 7:28pm

Have you seen my talk Use of reference models in variable selection? There is a part illustrating projecting just one point, and projecting the posterior draw-by-draw, and illustration of setting some coefficients to 0.

sayianka · March 20, 2024, 7:37am

I have not checked it.
Thank you for sharing, I am sure it will be helpful.

Topic		Replies	Views
Projection predictive feature selection for multilevel phylogenetic models Interfaces projpred	8	1283	March 25, 2021
Very basic projection predictive variable selection question Modeling projpred	4	459	July 2, 2022
Projpred: Projection Predictive Variable Prediction, why "Performing selection for each fold" take really long time in cv_varsel() General	2	277	May 22, 2023
Projection predictive variable and structure selection for GLMMs and GAMMs Publicity	8	728	October 16, 2020
New projpred is now on CRAN Publicity projpred	7	1077	November 9, 2020

How are sub-models fitted in Projection predictive inference (projpred)?

Related topics