Using {projpred} latent projection with {brms} Weibull family models

Thanks so much for taking the time to reply, much appreciated!

If I remember correctly, the default of dis = 1 in projpred v2.0.5.9000 (branch latent_projection) did not have a theoretical meaning for many families, including the Weibull family. That’s why we switched to a default of NA for those families where a suitable default such as 1 (probit link) or 1.6 (logit link) is not obvious. As long as users don’t run “post-processing” analyses (e.g., those from plot.vsel() and summary.vsel()) on latent scale, the dis values don’t matter (this is shortly mentioned in the latent-projection vignette in the negative binomial example). You seem to be using latent-scale analyses in question 4, but I’m not sure if you really need them or if you just ran them for comparison.

OK, so technically the implementation in v2.0.5.9000 for Weibull models using latent-scale analyses doesn’t have an obvious theoretical basis given a dis = 1 default. I ran the latent-scale analyses really just to check that the custom extend_family() functions I’d written were correct and were providing identical results between the two versions of projpred. This may just show my limited understanding of latent-scale analyses, but presumably a log-normal survival model might have a more obvious theoretical basis for dis? Not that it really matters if it’s better to look at the response scale anyway.

This is now fixed in branch master (by PR #510). You should now be able to run the CV in parallel by setting options(projpred.export_to_workers = c(“refm_shape”)) beforehand. If there are more objects you need to export, add their names to the character vector c(“refm_shape”).

This seems to be working fine, thank you!

To answer this, I would need to check the plots. Does your reprex give comparable plots? In any case, forward search instead of L1 search might be worth a try.

Yes, the reprex gives the plots below. They look OK when using just varsel, where only predictors X91:X100 are valuable (\beta_{91-95} = 0.5 and \beta_{96-100} = -0.5) and correlated (panel A below), but seem to degrade when adding in cross-validation with cv_varsel (panel B below). Interestingly, the baseline ELPD doesn’t match the loo() ELPD for the reference model, but I may just be misunderstanding how that’s presented/calculated in the plot. I’ve tried the following to see what may affect the results:

  1. Comparing horseshoe(par_ratio = 0.1) and R2D2(mean_R2 = 0.1, prec_R2 = 1.0, cons_D2 = 0.5) priors, which didn’t make a difference - I’d expect similar amounts of shrinkage for them anyway
  2. Comparing LOO and K-fold CV (as a few LOO folds had pareto warnings), which again made no difference
  3. Centering or scaling the predictors, which also made no difference

Weibull models:

I see the same odd plots when simulating some binomial data in the same way as the reprex (n = 100, p_{noise} = 90, p_{pred} = 10, \rho = 0.5, \beta_{91-95} = log(0.5), \beta_{96-100} = log(2)), even when using the latent projection. The results are in plots below (A, traditional projection varsel vs cv_varsel; B, latent projection varsel latent vs response scale; C latent projection cv_varsel latent vs response scale). I imagine it’s just some quirk with the way in which I’ve simulated the data, potentially because they all have the same coefficient? In the case of binomial models, I guess the latent scale dis parameter has a good theoretical basis, so this is less of an issue.

Binomial models: