Projpred: Projection Predictive Variable Prediction, why "Performing selection for each fold" take really long time in cv_varsel()

Hi everyone,

I used brms to fit a multilevel model (2 levels, the number of observations at the highest level, level 1, is 408 households; and level 2 units are 7). Everything worked well from reference model construction (using horseshol priors), prior sensitivity check (using priorsense), and kfold cross validation using brms::kfold(). The time needed for reference (full model, 18 predictor terms) was 336 seconds, and the time needed for brms::kfold() with 5 folds was 23 minutes (roughly). However, when I feed this cross validation with the current full reference model to cv_varsel() for variable selection, it takes hours without any outputs, and it just stop at “Performing selection for each forl” with 0%:
"[1] “Performing cross-validation for the reference model…”
Setting ‘K’ to the number of folds (5)
Fitting model 1 out of 5
Fitting model 2 out of 5
Fitting model 3 out of 5
Fitting model 4 out of 5
Fitting model 5 out of 5
Start sampling
Start sampling
Start sampling
Start sampling
Start sampling
[1] “Performing selection for each fold…”
| | 0%
Here is the codes I used for constructing reference model:
"priors_1 ← c(prior(student_t(3, 0, 2.5), class = “Intercept”),
prior(horseshoe(), class = “b”)
model_full_3 ← brm(totavegetablecrop ~ Gender + Edu + Literacy +
distancetomarket + infoAccess +
labourNonagri + landCult + landVege + yearExp +
vegeIncome + vegeCon + memberoneOrganization +
placegrowingvegetable + primaryPurposevegetable +
women_caretaker + source_total + hhsizeLabour +
(1 | eth_com),
data = data,
family = poisson,
iter = 5000,
prior = priors_1,
seed = 052023,
save_all_pars = TRUE)
And here is the codes I used for variable selection via cv_varsel()
"cvvs ← cv_varsel(model_full_3,
cv_method = “kfold”,
K = 5,
method = “forward”,
nclusters_pred = 20,
nterms_max = 8, # have run diagnostics terms >= 5, not improvement in prediction
seed = 12345
Can someone give some hints how best I can solve this problem?

I removed the data and will come back if there is some kind of solution I can find out or other suggestions, very appreciated.

Thanks @fweber144 for asking me to post here for our all beneficial sharing of problems/solutions, and may help me here with some suggestions


The projpred main vignette gives some information about that Poisson case:

For multilevel Poisson models, the traditional projection may take very long, see #353. According to the simulation-based case study from #353, the latent projection should be considered as a currently available remedy.

Thanks @fweber144 . I will see if it helps. Yes, it seems to be related to the outcome variable’s distribution family here.