That is a challenging daa
You need to use init_refmodel()
function. See Reference model and more general information — refmodel-init-get • projpred and full example code using spca (from the paper Using reference models in variable selection | Computational Statistics)
https://github.com/fpavone/ref-approach-paper/blob/a15f821d76a05d6934b672865332a643a53ac8dd/code/minimal_subset.R
With that many variables you want to use either method='L1'
or if using method='forward'
limit the search e.g. with nterms_max=20
. Try first with validate_search=FALSE
just to test how much time one search path takes (see more in [2306.15581] Robust and efficient projection predictive inference). And if that works, then rerun with validate_search=TRUE
and possibly with cv_method='kfold', K=10
.
You may also try a simpler and faster approach like loc.fdr
combined with the reference model (see Section 4.2 in Using reference models in variable selection | Computational Statistics)