As a user of projpred who wished to be able to use it more, there are a couple of things I’ve been thinking about. I hope this could be of relevance for the design of the next version of projpred or for discussion in a projpred session at StanCon (I won’t be there!).
1. Definition of the starting model for selection
Currently the search always starts from a null model and builds up from there. This is not a good setup in cases in which one wishes to enrich an existing baseline model (which may contain already known confounders or covariates of value).
A way to obtain that is by using the penalty
argument: one could set a penalty of 0 for the variables that should always be included, and these will be picked first. Unfortutately at the moment this works only when using the L1 search but not with forward selection.
At some point I managed to cook up a patch that allowed setting a penalty of 0 for forward selection, but it was not an elegant solution. One point to understand (and perhaps that’s why it was not implemented in the first place, @avehtari?) is the following: is there any meaning for a non-zero (and non-infinite) penalty in forward selection?
2. Opportunities for parallelism
I don’t think anything in projpred is parallelised. There are spots in which it’s almost trivial to parallelise computation, such as when projecting each of the posterior samples for the non-gaussian case (project_nongaussian
in projfun.R
), or running multiple cross-validation folds (kfold_varsel()
in cv_varsel.R
). The looping over the candidate variables for forward selection happens in the C++ code, so perhaps that’s less straightforward.
3. Feedback to users (but also testing and coverage)
This last point is of less importance, but in some way it’s the easiest to address. The package is very flexible, which allows a user to fine-tune different aspects of the algorithm. Unfortunately, this means that inputs must be checked carefully and errors should be reported to the user sooner with a helpful message rather than after a long computation with a crash or an impenetrable error.
One way to try get a handle on that is by expanding the sets of tests, so that a larger portion of the possible code paths are covered. So, in my mind, addressing this has an added benefit for developers too, as having more tests and coverage can give one some more confidence when making changes.
Hope this helps,
Marco