Projpred - model with horseshoe prior does not converge

This is my first time trying to use regularised regressions. I have two data sets. I want to build a predictive model (with cross validation) on the first and then validate on the second data set. In the first data set, I have ~ 670 rows (complete cases) and 155 predictors. I have removed any predictors that were correlated > abs(0.9) with each other (but there are quite a few still that are close to 0.9).

I am trying to follow the instructions from projpred: Projection predictive feature selection • projpred to build a regularised predictive model.

I seem to already get stuck at the first step - I’ve built my model, but it keeps getting divergent transitions.

x.formula = paste0(psych,'~',paste(names(df)[1:(length(names(df))-1)],collapse='+')) 
# note: I have scaled the df

fit= brm(formula=(x.formula),
          data=df,
          family=gaussian(),
          prior=c(prior(normal(0,1),class='Intercept'),
                  prior(horseshoe(par_ratio=0.1),class='b')), # assuming 10% of regressors might be significant? 
          control=list(adapt_delta=0.99,max_treedepth=15),
          chains=3,iter=5000,
         seed=100)

I have ~20 divergent transitions. Rhat are ok. Is it ok to continue with the next steps for projpred or do I need to get rid of the divergent transitions? I see from the manuals that tuning adapt_delta or max_treedepth more is unlikely to help. I am not sure I’ve set par_ratio correctly. Is there anything I can further try to tune?

I’d be very grateful for any pointers.

(As a separate question, I think for now projpred cannot be used with imputation, is that correct? I’m loosing some rows and some regressors; I think I’d have ~700 rows and 170 regressors if I could include those with missing values - and they are likely not missing at random. I think I will need to also going to use some other modelling approaches that can deal with imputation to make sure the results are at least similar…)

Edit: I’ve come across several further questions I’m unsure about:

  • could I just replace the horseshoe by other priors that would not produce divergent samples?
  • if the original model is not better than a null model, i’m assuming running projpred does not make sense?
  • if I want to validate my (cross-validated) model on new data, is the best way to do this to get a prediction (with proj_linpred) and then i’m not sure what to do to compare it to the new y - should I correlate the values? Do the prediction 10,000 with a random shuffle of the y-values? Or compare to the predictions of a model with only an intercept (using elpd)?

If you have two predictors that are highly correlated with each other, are you removing them both? What if they would be useful for predicting the target? One of the benefits of projpred is that it can select the most useful predictors among correlating ones (and you don’t need to do pre-screening).

If you have many highly correlating predictors you could consider also making the reference model using supervised PCA as was done in Projective inference in high-dimensional problems: Prediction and feature selection

For horseshoe, 20 is not much. With your current setup the posterior is likely to have a funnel shape, but that narrow part which is not well explored (indicated by the divergences) is also likely to be something that you don’t actually care. You could consider making the slab scale smaller based on more carefully thought prior information about the likely scale of the non-zero weights, and you could make slab degrees of freedom bigger to make that prior more informative about the tails.

You can use it with multiple imputation by repeating the projection and variable selection for each imputed data set and combine the results in the end (this is the usual multiple imputation approach)

It would be best to choose the prior based on your assumptions about the distribution of the coefficient magnitudes. You may use other prior if that macthes your assumptions better.

Yep. If the model with all covariates is not better than the null model, projpred is not doing anything useful.

I didn’t understand this. Can you clarify?

2 Likes

Thank you so much for all your help, this is wonderful!!!

Sorry, this was very condensed and not clear. Ok, so I have two data sets that were acquired separately. My plan was to fit (with cross-validation in projpred) one data set. And then use the second untouched/holdout data set to get performance on completely new data. From what I understand, from the final projpred model, I can get predictions for the holdout data set. Then I’m not quite sure how to test whether these predictions are ‘good’. I can think of several methods to test it:

  • shuffle the y in the holdout data set say 10,000 times and compare some measure of model fit [say the logwise predictive density] to that distribution of shuffled data (to get a p-value).
  • fit an alternative model that only has the intercept (also on the 1st data set of course) and compare the logwise predictive density (on the holdout data set) between the two models (using loo_compare, using some cut-off, like elpd_diff>4 to say ‘yes, it was a good model’)
  • correlate the prediction of y in the holdout data set with the actual y to get a p-value on the correlation
1 Like

I have another small issue. I want to get the predicted y for my new data (independent test data), where I’ve only selected the xs that were included in the final model of the train/test data. When I use proj_linpred it says that other x values (not included in the final model) are missing (it says 'following variables can neither be found in ‘data’ nor in ‘data2’). I am confused because I thought that the model was going to be only based on the small subset of terms selected (soltrms_final)?

prj = project(fit,solution_terms=soltrms_final)

df = new_data %>%
dplyr::select(any_of(soltrms_final)) %>%
drop_na() %>%
mutate_all(scale)

prj_linpred = proj_linpred(prj,newdata=df,integrated=TRUE)
prj.predict= proj_predict(prj,newdata=df)

All these are fine, but I’m not a fan of p-values and dichotomizing good vs bad, and it would be better to show the actual values / distributions instead of thresholding. In addition, it would be great if you have domain knowledge to assess what can be considered to be good predictive accuracy.

1 Like

I have not much to add to what @avehtari said, only the following:

For the prior: As @avehtari said, you are free to use any prior you like in your reference model. Personally, I have made some good experience with the R2-D2 prior (Zhang et al., 2022) in cases where I was unable to make the regularized horseshoe work as desired (but my understanding of the regularized horseshoe prior might not be good enough and I also did not always spend too much time on tweaking it).

For the multiple imputation thing, a similar issue has come up at Using projpred with brm_multiple and mice · Issue #275 · stan-dev/projpred · GitHub (related to Projpred interpretation). Conclusion is that projpred currently does not support multiple imputation, so you would have to come up with your own solution for this (as @avehtari said: in principle, this is possible). If you find a general way to handle multiple imputation in projpred, a pull request on GitHub is very welcome.

For the validation of the model on the independent hold-out dataset, I just want to add that if you have some other plausible model, I also think it’s better not to compare the predictive performance of your final model to some artificial “null hypothesis” consisting of the empty model but rather to that other plausible model. This goes in the same direction as @avehtari’s recommendation not to use p-values and basically means the same as you suggested above:

but with the important distinction that you replace the model consisting of only the intercept by that other plausible model. For that, you won’t be able to use loo::loo_compare() (as far as I know), but you will have to write some own code to perform the ELPD comparison.

Concerning the proj_linpred() issue

this is a specialty of brms and not caused by projpred, if I remember correctly. But you can set the unselected predictors to some dummy value (e.g., zero for numeric predictors). This should not affect the results because—as you said—only the selected terms (those from argument solution_terms) will be used. As a long-term objective, a possibility could be to create a pull request for brms, adding an optional argument by the help of which not all predictors are required. projpred could then make use of that argument.

2 Likes

Many thanks for your help!!

Yes, I agree that makes more sense than an empty model. I did not know I can’t use loo_compare. I will need to look into this further.

I will do as you suggest and set the other variables to dummy NA to get the model on the hold out data to work. It’s good to hear that I didn’t misunderstand the whole thing and those would actually be used.

Ok, so I have run this now on my data. And the ‘good result’ is that when I fit models on my training-test data with cv-variable selection, I get significant predictions in my hold out data. So that seems to suggest something is right.

But I’m not quite sure what to make of the figures from cv_varsel (attached)
ExploratoryAnalysis_BRMSprojpred_cv_intolUncert_InhibAnx

Based on this, the automated feature selection tells me to include 2 features. Is it normal for the values to be negative again for later fits with more terms? Doesn’t it suggest more terms would be better? The graph didn’t look like this in the vignette, so I’m a bit unsure whether I need to change something … Here are the settings that I used

cvvs.final = cv_varsel(fit,
                             nterms_max=modsize_decided+5,
                             validate_search=TRUE,
                             seed=100)

Where I got ‘modsize_decided’ from a first run of cv_varsel with validate_search=FALSE and nterms_max=30.

And the output says:
Observations: 670
CV method: LOO search included
Search method: l1, maximum number of terms 7
Number of clusters used for selection: 1
Number of draws used for prediction: 400
Suggested Projection Size: 2

I’d be very grateful for any pointers.

Can you try again with method = "forward"?

1 Like

Thanks! I’m rerunning now with ‘forward’. I’ve noticed that there is potentially a way to run it in parallel? But me just using

options(mc.cores= parallel::detectCores())
rstan_options(auto_write= TRUE)

Does not automatically do it. I’ve not managed to find the instructions for how to set it to parallel?

Some time ago, I have implemented a parallelization of the projection, as good as possible. Instructions may be found in the general package documentation (on the website, this is Projection predictive feature selection — projpred-package • projpred). Unfortunately, there are multiple restrictions for that parallelization, as explained in the documentation. Due to these, I did not advertise that feature too heavily (e.g., the projections in the vignette are not parallelized). A better parallelization is on my to-do list.

1 Like