Using {projpred} latent projection with {brms} Weibull family models

Your description is correct, although I guess that with predict(<refmodel>), you mean ref_predfun which is an argument of init_refmodel() and accepts a function, in contrast to predict.refmodel() which is a standalone function/method (and essentially a wrapper around refmodel$ref_predfun where refmodel is the object returned by init_refmodel()).

The predict.refmodel() method is almost never used (at least I haven’t seen it in applications yet).

The ref_predfun function is used internally at various places. Usually, ref_predfun doesn’t need to be specified by the user, but in this case you are right that ref_predfun would need to be specified by the user to take the censoring into account (via the Bernoulli distribution). I didn’t think so far because in my opinion, the bigger problem is to identify the censored observations from within the latent_ll_oscale function (and from within the latent_ilink and latent_ppd_oscale functions as well). This is “just” a technical issue that should be resolvable, but it will require some changes in projpred:

  • The ideal solution would be to allow for an extension of the formula in init_refmodel() (similarly to brms’s resp_cens() term) and then to carry around the censoring (or event) indicators internally, eventually passing them to latent_ll_oscale and friends. Functions with newdata arguments (predict.refmodel(), proj_linpred(), proj_predict()) would also have to extract the censoring (or event) indicators from newdata and pass them to latent_ll_oscale etc.
  • A less elegant—but perhaps faster to implement—solution could be to pass the indices of the observations entering the latent_ll_oscale function to latent_ll_oscale (and analogously for latent_ilink and latent_ppd_oscale). This would make censoring (or event) indicators obsolete for LOO CV and K-fold CV with validate_search = TRUE, but not for predict.refmodel(), proj_linpred(), and proj_predict() because these 3 functions accept newdata and hence indices of observations from the original dataset don’t help there. Instead, for these 3 functions, we would probably have to pass newdata to latent_ll_oscale (and latent_ilink and latent_ppd_oscale). I’m not sure if this approach will work, but we could try it.
1 Like