Projectpred, BMA, variable selection and joint posterior probabilities

Apologies if this is the wrong forum.

I’m interested in using Aki Vehtari’s projectpred (Projection predictive variable selection – A review and recommendations for the practicing statistician) for variable selection, but I’m particularly interested in the joint posterior probabilities (JPP) rather than the marginal posterior probabilities (MPP). Does anyone have any experience or ideas on whether this is possible? My use case is a very simple logistic regression (similar to the bodyfat example linked).

Thanks in advance,

Rich

1 Like

This is definitely the right forum - tagging @AlejandroCatalina who should know more (I personally unfortunately don’t know much about projpred).

Thanks @martinmodrak for tagging.

Can you elaborate your idea a bit more @datarichard ?

1 Like

Projpred is not providing marginal posterior probabilities. It’s providing the best projection mode for each model size (conditional how well the search thorugh the model space works). The model size selection is based on cross-validated log score.

1 Like

Thankyou for replying and apologies I haven’t followed up sooner. I think I understand somewhat better now the difference between the marginal posterior probabilities and the best projection mode. So I understand that I cannot generate the marginal posterior probabilities of the variables from the projections. However when considering how variables enter the model, is it possible to determine/describe when a combination of variables produces better predictions than alone? Or (oppositely) when a variable never or rarely appears in the model with another variable (e.g., due to colinearity). This table from the bodyfat notebook makes me think it is, since it implies (to me) the former case:

I believe in some contexts, this is known as the “jointness” or the “joint posterior probability” (Econometrics | Free Full-Text | A Review of the ‘BMS’ Package for R with Focus on Jointness | HTML) - although obviously projPred won’t provide the same mathematical version of “jointness” since it doesn’t deal with posterior probabilities. But I’m still wondering if there is a sense of “jointness” implied by this table such that abdomen and weight have a higher jointness than abdomen and knee (or some other variable not shown).

Many thanks,

Rich

First, the marginal posterior probabilities are misleading about jointness. If there are correlating variables the posterior probabilities are diluted. That’s why I gave up on them and started developing projpred.

In projpred, select the minimal set that predicts as well as all covariates and then look how the covariates are correlating. See example figures in Markus Paasiniemi’s MSc thesis Methods and Tools for Interpretable Bayesian Variable Selection

2 Likes

Thanks for the tip on the thesis. I’m guessing you are referring to Figure 7:

I think I understand how the right panel is working - it would be handy to be able to recreate this plot on my dataset.

1 Like

The shinyproj interface doesn’t work with the new projpred, but you can find the code how the figures were made in GitHub - paasim/shinyproj: An R package for interactive model selection using projpred.

1 Like