Using {projpred} latent projection with {brms} Weibull family models

rtnliqry · April 14, 2025, 3:05pm

Hi Stan Forum!

I am currently trying to implement the use of the excellent projpred package (thanks @fweber144 and colleagues!) for variable selection with brms parametric survival models that use a Weibull response distribution (with the default scale = “log” and shape = “identity” link functions). As in Catalina (2021) you can use the latent space projection approach to do this. This works OK with the GitHub branch (v2.0.5.9000) that the authors refer to in the paper; however, I’m hoping to implement it in projpred v2.8.0 to make use of the latest features and as an exercise to improve my understanding.

Following the threads of the above paper – and other related posts on this forum and GitHub – to the relevant vignette and documentation, implementing Weibull models with projpred v.2.8.0 requires passing custom functions for latent_ilink(), latent_ll_oscale(), and latent_ppd_oscale() to extend_family(). Following the vignette, I’ve implemented these functions for the Weibull distribution as in the reprex below. This works and provides more-or-less identical results to projpred v2.0.5.9000 with the same (correct) variables being selected in the same order. Despite this, I have a few questions:

Have I implemented these extend_family() functions correctly?
projpred v2.0.5.9000 sets the latent Weibull reference model dispersion parameter (dis) to 1 by default. projpred v2.8.0 sets this to NA (presumably because there’s no default for weibull models); however, if I manually set this to 1, I can fully reproduce the results from projpred v2.0.5.9000. My ignorance is on full display here, but given its presence in v2.0.5.9000, I assume this is an acceptable thing to do?
Having to pass refm_shape through to latent_ll_oscale_weib() and latent_ppd_oscale_weib() means that the parallelisation fails, as this global environment object isn’t exported to the parallel workers. I see this issue has been raised on GitHub, but I wondered if there’s a short-term workaround by manually editing the call to foreach directly?
The elpd plots look good when on the latent scale, but bad on the original response scale (resp_oscale = TRUE) – is this problematic? I assume this is kind of the point of the approach?

The actual problem case is very high-dimensional (p ~ 2,500, n ~ 250) and hierarchical in nature, but I’ve first simulated some simpler, non-hierarchical, lower-dimensional data (p = 100, n = 100) to make a reprex below.

## Packages ----
library(brms)
library(devtools)
library(dplyr)
library(MASS)
library(Matrix)
devtools::install_github("stan-dev/projpred", ref = "latent_projection") # v2.0.5.9000
library(projpred)
library(trialr)

## Simulate data ----
# Sample from LKJ distribution for a randomised correlation matrix
set.seed(1234)
cor_mat <- trialr::rlkjcorr(n = 1, K = 100, eta = 1)

# First and second variable sets correlate with each another within each set
cor_mat[91:95, 91:95] <- matrix(rep(0.5, times = 25), ncol = 5)
cor_mat[96:100, 96:100] <- matrix(rep(0.5, times = 25), ncol = 5)

# First and second variable sets do not correlate with one another between sets
cor_mat[91:95, 96:100] <- matrix(rep(0, times = 25))
cor_mat[96:100, 91:95] <- matrix(rep(0, times = 25))
diag(cor_mat) <- 1

# Sample some standard deviations of the variables
sd <- rgamma(100, 1, 2)
  
# Convert the correlation matrix to a covariance matrix
cov_mat <- diag(sd) %*% cor_mat %*% diag(sd)
  
# Sample variables from the multivariate normal distribution
vars <- MASS::mvrnorm(
  n = 100
  ,mu = rnorm(100, 30, 2.5) 
  ,Sigma = Matrix::nearPD(cov_mat)$mat 
)
  
# Scale the variables 
scaled_vars <- apply(vars, 2, scale, scale = TRUE)
  
# Draws from Weibull distribution where only the last set of variables have predictive information 
shape <- 1.2
predictors <- scaled_vars[,91:100] %*% rep(c(0.5, -0.5), each = 5)
linpred <- exp(0 + predictors)
scale_pred <- linpred / (gamma(1 + (1 / shape))) # brms AFT parameterisation

y <- rweibull(
  n = 100
  ,shape = shape
  ,scale = scale_pred
)

cens <- runif(100, 0, 4)
time <- pmin(y, cens)
status <- as.numeric(y <= cens)

df_sim <- data.frame(
  time = time
  ,status = status
  ,censored = 1 - status
  ,vars 
)

## Fit model with all variables (ignore a few divergent transitions for now)
model_hs <- brm(
  time | cens(censored) ~ 1 + .
  ,family = weibull
  ,data = df_sim |> dplyr::select(-status)
  ,chains = 4
  ,iter = 3000
  ,prior = c(
    prior(normal(0, 100), class = Intercept)
    ,prior(horseshoe(par_ratio = 0.1), class = b)
    ,prior(gamma(1, 0.5), class = shape)
  )
  ,seed = 1234
  ,cores = 4
  ,control = list(adapt_delta = 0.95, max_treedepth = 15)
  ,save_pars = save_pars(all = TRUE) 
) 

## projpred v2.0.5.9000 ----
# Generate the reference model object with latent projection
refm_hs <- get_refmodel(model_hs, latent_proj = TRUE)
refm_hs$dis # all equal to 1 by default

ncores <- 10
doParallel::registerDoParallel(ncores)

# cv_varsel with official latent reference model
refm_hs_vsel_l1 <- cv_varsel(
  refm_hs
  ,method = "L1" # purely for speed
  ,cv_method = "LOO" # purely for speed
  ,seed = 1234
  ,validate_search = TRUE
  ,parallel = TRUE
  ,nterms_max = 10
)

## projpred v2.8.0 ----
install.packages("projpred") # install v2.8.0 from CRAN

# Custom extend_family() functions for Weibull family latent projection as per
# https://mc-stan.org/projpred/articles/latent.html#negbinex
refm_shape <- as.matrix(model_hs)[, "shape", drop = FALSE]

latent_ilink_weib <- function(
    lpreds
    ,cl_ref
    ,wdraws_ref = rep(1, length(cl_ref))
) {
  
  ilpreds <- exp(lpreds) # mu parameter link = "log"
  return(ilpreds)
  
}

latent_ll_oscale_weib <- function(
    ilpreds
    ,y_oscale
    ,wobs = rep(1, length(y_oscale))
    ,cl_ref
    ,wdraws_ref = rep(1, length(cl_ref))
) {
  
  y_oscale_mat <- matrix(
    y_oscale
    ,nrow = nrow(ilpreds)
    ,ncol = ncol(ilpreds)
    ,byrow = TRUE
  )
  
  wobs_mat <- matrix(
    wobs
    ,nrow = nrow(ilpreds)
    ,ncol = ncol(ilpreds)
    ,byrow = TRUE
  )
  
  refm_shape_agg <- cl_agg(refm_shape, cl = cl_ref, wdraws = wdraws_ref)
  ll_unw <- dweibull(
    y_oscale_mat
    ,shape = refm_shape_agg
    ,scale = ilpreds / gamma(1 + 1 / as.vector(refm_shape_agg)) # brms AFT parameterisation
    ,log = TRUE
  )
  return(wobs_mat * ll_unw)
  
}

latent_ppd_oscale_weib <- function(
    ilpreds_resamp
    ,wobs
    ,cl_ref
    ,wdraws_ref = rep(1, length(cl_ref))
    ,idxs_prjdraws
) {
  
  refm_shape_agg <- cl_agg(refm_shape, cl = cl_ref, wdraws = wdraws_ref)
  refm_shape_agg_resamp <- refm_shape_agg[idxs_prjdraws, , drop = FALSE]
  ppd <- rweibull(
    prod(dim(ilpreds_resamp))
    ,shape = refm_shape_agg_resamp
    ,scale = ilpreds_resamp / gamma(1 + 1 / as.vector(refm_shape_agg_resamp)) # brms AFT
  )
  ppd <- matrix(ppd, nrow = nrow(ilpreds_resamp), ncol = ncol(ilpreds_resamp))
  return(ppd)
  
}

# Generate the custom reference model object with latent projection
refm_hs_weib <- get_refmodel(
  model_hs
  ,latent = TRUE
  ,dis = rep(1, times = 6000) # to match v2.0.5.9000
  ,latent_ilink = latent_ilink_weib
  ,latent_ll_oscale = latent_ll_oscale_weib
  ,latent_ppd_oscale = latent_ppd_oscale_weib
)

# cv_varsel with custom latent reference model
refm_hs_weib_vsel_l1 <- cv_varsel(
  refm_hs_weib
  ,method = "L1" # purely for speed
  ,cv_method = "LOO" # purely for speed
  ,seed = 1234
  ,validate_search = TRUE
  ,parallel = FALSE # won't run due to `refm_shape` 
  ,nterms_max = 10
)

# Check the selected predictors
identical(
  refm_hs_vsel_l1$solution_terms
  ,refm_hs_weib_vsel_l1$predictor_ranking
) # TRUE

# Check the ELPD values (effectively identical)
round(refm_hs_vsel_l1$summary$elpd.loo, 1)
# [1] -239.5 -215.7 -186.3 -172.6 -165.6 -158.2 -151.5 -141.5 -134.5 -128.1 -126.7

round(summary(refm_hs_weib_vsel_l1, resp_oscale = FALSE)$perf_sub$elpd, 1)
# [1] -250.4 -226.3 -185.3 -177.7 -166.4 -159.2 -147.0 -135.5 -129.6 -126.7 -124.5

Operating System: Ubuntu 24.04.2 LTS (running in WSL2)
brms Version: 2.22.0
projpred Version(s): 2.0.5.9000 and 2.8.0
trialr Version: 0.1.6

Thanks in advance!

fweber144 · April 18, 2025, 7:50pm

Have I implemented these extend_family() functions correctly?

Yes, they seem correct to me.

projpred v2.0.5.9000 sets the latent Weibull reference model dispersion parameter (dis) to 1 by default. projpred v2.8.0 sets this to NA (presumably because there’s no default for weibull models); however, if I manually set this to 1, I can fully reproduce the results from projpred v2.0.5.9000. My ignorance is on full display here, but given its presence in v2.0.5.9000, I assume this is an acceptable thing to do?

If I remember correctly, the default of dis = 1 in projpred v2.0.5.9000 (branch latent_projection) did not have a theoretical meaning for many families, including the Weibull family. That’s why we switched to a default of NA for those families where a suitable default such as 1 (probit link) or 1.6 (logit link) is not obvious. As long as users don’t run “post-processing” analyses (e.g., those from plot.vsel() and summary.vsel()) on latent scale, the dis values don’t matter (this is shortly mentioned in the latent-projection vignette in the negative binomial example). You seem to be using latent-scale analyses in question 4, but I’m not sure if you really need them or if you just ran them for comparison.

Having to pass refm_shape through to latent_ll_oscale_weib() and latent_ppd_oscale_weib() means that the parallelisation fails, as this global environment object isn’t exported to the parallel workers. I see this issue has been raised on GitHub, but I wondered if there’s a short-term workaround by manually editing the call to foreach directly?

From my experience, editing functions from packages within a running R session is possible but quite complicated (especially in a non-interactive way). So I would suggest that I’ll try to fix this in the GitHub version as soon as possible so you can use that. Sorry that I haven’t managed to work on this yet, there has been some other stuff that I wanted to finish first.

The elpd plots look good when on the latent scale, but bad on the original response scale (resp_oscale = TRUE) – is this problematic? I assume this is kind of the point of the approach?

To answer this, I would need to check the plots. Does your reprex give comparable plots? In any case, forward search instead of L1 search might be worth a try.

fweber144 · April 22, 2025, 9:04pm

This is now fixed in branch master (by PR #510). You should now be able to run the CV in parallel by setting options(projpred.export_to_workers = c("refm_shape")) beforehand. If there are more objects you need to export, add their names to the character vector c("refm_shape").

rtnliqry · April 25, 2025, 10:46am

Thanks so much for taking the time to reply, much appreciated!

If I remember correctly, the default of dis = 1 in projpred v2.0.5.9000 (branch latent_projection) did not have a theoretical meaning for many families, including the Weibull family. That’s why we switched to a default of NA for those families where a suitable default such as 1 (probit link) or 1.6 (logit link) is not obvious. As long as users don’t run “post-processing” analyses (e.g., those from plot.vsel() and summary.vsel()) on latent scale, the dis values don’t matter (this is shortly mentioned in the latent-projection vignette in the negative binomial example). You seem to be using latent-scale analyses in question 4, but I’m not sure if you really need them or if you just ran them for comparison.

OK, so technically the implementation in v2.0.5.9000 for Weibull models using latent-scale analyses doesn’t have an obvious theoretical basis given a dis = 1 default. I ran the latent-scale analyses really just to check that the custom extend_family() functions I’d written were correct and were providing identical results between the two versions of projpred. This may just show my limited understanding of latent-scale analyses, but presumably a log-normal survival model might have a more obvious theoretical basis for dis? Not that it really matters if it’s better to look at the response scale anyway.

This is now fixed in branch master (by PR #510). You should now be able to run the CV in parallel by setting options(projpred.export_to_workers = c(“refm_shape”)) beforehand. If there are more objects you need to export, add their names to the character vector c(“refm_shape”).

This seems to be working fine, thank you!

To answer this, I would need to check the plots. Does your reprex give comparable plots? In any case, forward search instead of L1 search might be worth a try.

Yes, the reprex gives the plots below. They look OK when using just varsel, where only predictors X91:X100 are valuable (\beta_{91-95} = 0.5 and \beta_{96-100} = -0.5) and correlated (panel A below), but seem to degrade when adding in cross-validation with cv_varsel (panel B below). Interestingly, the baseline ELPD doesn’t match the loo() ELPD for the reference model, but I may just be misunderstanding how that’s presented/calculated in the plot. I’ve tried the following to see what may affect the results:

Comparing horseshoe(par_ratio = 0.1) and R2D2(mean_R2 = 0.1, prec_R2 = 1.0, cons_D2 = 0.5) priors, which didn’t make a difference - I’d expect similar amounts of shrinkage for them anyway
Comparing LOO and K-fold CV (as a few LOO folds had pareto warnings), which again made no difference
Centering or scaling the predictors, which also made no difference

Weibull models:

I see the same odd plots when simulating some binomial data in the same way as the reprex (n = 100, p_{noise} = 90, p_{pred} = 10, \rho = 0.5, \beta_{91-95} = log(0.5), \beta_{96-100} = log(2)), even when using the latent projection. The results are in plots below (A, traditional projection varsel vs cv_varsel; B, latent projection varsel latent vs response scale; C latent projection cv_varsel latent vs response scale). I imagine it’s just some quirk with the way in which I’ve simulated the data, potentially because they all have the same coefficient? In the case of binomial models, I guess the latent scale dis parameter has a good theoretical basis, so this is less of an issue.

Binomial models:

fweber144 · May 3, 2025, 7:39pm

That’s correct. However, while thinking about this, I realized that so far, projpred did not support the use of the projected dispersion parameter draws in the latent_ll_oscale and latent_ppd_oscale functions (for the families for which I have used the latent projection so far, this was not necessary). For a log-normal model, this is indeed necessary for appropriate response-scale analyses, so I have added support in #513. Hence, if you want to use a log-normal model, you should use the current GitHub version (branch master) and adapt your latent_ll_oscale and latent_ppd_oscale functions accordingly (you will also have to update your latent_ll_oscale and latent_ppd_oscale functions for the Weibull model, but the arguments dis and dis_resamp which need to be added will remain unused there).

Also note the following:

In case of the Weibull model, the latent projection is not solving the original (i.e., response-scale) projection problem exactly, which is due to the Gaussian approximation on latent scale, but also due to the fact that the latent projection cannot yield projected shape parameter draws (this can be seen above: refm_shape can only be taken into account in an “ad-hoc” manner, i.e., to get from latent scale to response scale, you are using the original reference model draws for refm_shape; these draws have not been modified by the projection).
In case of a log-normal model, the Gaussian distribution on latent scale is exact (by definition of the log-normal distribution), but the latent projection is still not solving the original (i.e., response-scale) projection problem exactly, which can be seen by writing out the original and the latent projection equations (Kullback-Leibler divergences; the point is the \frac{1}{y} Jacobian adjustment in the density of the log-normal distribution of the reference model). However, I think that the approximation error should be smaller than for the Weibull model.

Not that it really matters if it’s better to look at the response scale anyway.

Yes, I would say that in general, it’s better to look at response scale.

Yes, the reprex gives the plots below. […]

Thanks. The cv_varsel() plots are not that bad, I would say. Perhaps you can get smoother cv_varsel() plots by using more expensive argument values in cv_varsel()? For example, have you tried with exact LOO-CV (i.e., K-fold CV with K = N)?

Interestingly, the baseline ELPD doesn’t match the loo() ELPD for the reference model, but I may just be misunderstanding how that’s presented/calculated in the plot.

At the first glance, this is surprising for me as well. I will have to think about this and perhaps play around with your reprex. It might take me some time to answer this, sorry.

fweber144 · May 5, 2025, 10:15am

I’m just realizing that you have censored observations. Censoring is not taken into account by your latent_ll_oscale function, which explains the differing reference model ELPD estimates from projpred and loo() (loo() support for brms models is implemented in the brms package and in brms:::log_lik_weibull(), you can see that censoring is taken into account there). The problem is that it won’t be possible to account for censoring in your latent_ll_oscale function because this would require projpred to carry around the censoring (or event) indicators internally. I’ve heard about a possibility to take censoring into account by using pseudo-observations^[1], but I’m not sure if that works in projpred out of the box.

I’m sorry I didn’t notice the censoring earlier when you asked whether your functions were implemented correctly.

Orsini, L. , Brard, C., Lesaffre, E., Dejardin, D. & Le Teuff, G., 2023, “Bayesian survival analysis using pseudo-observations”, IWSM 2023 Proceedings Book, https://iwsm2023.statistik.tu-dortmund.de/storages/iwsm2023-statistik/r/dokumente/IWSM_2023_Conference_Proceedings.pdf ↩︎

avehtari · May 5, 2025, 4:32pm

As the censoring used is implemented by integrating out the unknown event time, the likelihood for the censored observation is equal to the Bernoulli likelihood. So it would be sufficient to compute the latent distribution for non-censored observation from Weibull and for censored from Bernoulli. And then make the projection using only the latent normal approximation.

fweber144 · May 5, 2025, 6:06pm

Yes, but in my opinion, the problem is how to identify the censored observations from within latent_ll_oscale. That’s what I meant with

this would require projpred to carry around the censoring (or event) indicators internally

avehtari · May 6, 2025, 6:51am

I thought we could define such init_refmodel() and predict(<refmodel>) that would make everything to look like normal for projpred, that is, the user can hide all the details of the model in init_refmodel() and predict(<refmodel>). Maybe I have forgotten how these functions can be used?

fweber144 · May 7, 2025, 12:44pm

Your description is correct, although I guess that with predict(<refmodel>), you mean ref_predfun which is an argument of init_refmodel() and accepts a function, in contrast to predict.refmodel() which is a standalone function/method (and essentially a wrapper around refmodel$ref_predfun where refmodel is the object returned by init_refmodel()).

The predict.refmodel() method is almost never used (at least I haven’t seen it in applications yet).

The ref_predfun function is used internally at various places. Usually, ref_predfun doesn’t need to be specified by the user, but in this case you are right that ref_predfun would need to be specified by the user to take the censoring into account (via the Bernoulli distribution). I didn’t think so far because in my opinion, the bigger problem is to identify the censored observations from within the latent_ll_oscale function (and from within the latent_ilink and latent_ppd_oscale functions as well). This is “just” a technical issue that should be resolvable, but it will require some changes in projpred:

The ideal solution would be to allow for an extension of the formula in init_refmodel() (similarly to brms’s resp_cens() term) and then to carry around the censoring (or event) indicators internally, eventually passing them to latent_ll_oscale and friends. Functions with newdata arguments (predict.refmodel(), proj_linpred(), proj_predict()) would also have to extract the censoring (or event) indicators from newdata and pass them to latent_ll_oscale etc.
A less elegant—but perhaps faster to implement—solution could be to pass the indices of the observations entering the latent_ll_oscale function to latent_ll_oscale (and analogously for latent_ilink and latent_ppd_oscale). This would make censoring (or event) indicators obsolete for LOO CV and K-fold CV with validate_search = TRUE, but not for predict.refmodel(), proj_linpred(), and proj_predict() because these 3 functions accept newdata and hence indices of observations from the original dataset don’t help there. Instead, for these 3 functions, we would probably have to pass newdata to latent_ll_oscale (and latent_ilink and latent_ppd_oscale). I’m not sure if this approach will work, but we could try it.

rtnliqry · May 10, 2025, 10:02am

Thanks again @fweber144 for taking the time to reply and thanks for your input @avehtari!

In case of a log-normal model, the Gaussian distribution on latent scale is exact (by definition of the log-normal distribution) […]

OK, I think the stronger theoretical basis in the latent space for the log-normal family seems like the better option moving forward. The shape of the hazard function will also likely be applicable to the actual problem. Thanks for making the edits to the package – I assume dis in get_refmodel() will then just be the posterior draws of the sd parameter and that, given that brms uses identity for both parameters, latent_ilink() will become irrelevant? I also assume that dis and dis_resamp in latent_ll_oscale() and latent_ppd_oscale() will just be passed through untouched to the sdlog arguments of dlnorm() and rlnorm(), respectively?

Thanks. The cv_varsel() plots are not that bad, I would say. Perhaps you can get smoother cv_varsel() plots by using more expensive argument values in cv_varsel()? For example, have you tried with exact LOO-CV (i.e., K-fold CV with K=N)?

I’ll give this a go and see what the results look like. I was mainly just interested that, even with the binomial model, you’d make quite different inferences looking at the latent vs response scales, even though the correct variables are being selected.

[…] Censoring is not taken into account by your latent_ll_oscale function […]

@avehtari The vignette for v2.0.5.9000 gives a Weibull survival model example with censoring – was it handled differently or is it just not relevant given that there’s no conversion back to the original response scale in this older version? With this in mind – putting the discussed future changes to projpred to incorporate censoring support aside – if I compromise and stick to analysis only on the latent scale (which we’ve established has a stronger theoretical basis for the log-normal anyway), does the inability of latent_ll_oscale() and latent_ppd_oscale() to account for censoring matter in the mean time?

fweber144 · May 10, 2025, 8:41pm

Yes, your conclusions are correct, except that I wouldn’t say that latent_ilink() will become irrelevant—it needs to be the identity function in this case.

It is well possible that the latent-scale plot and the response-scale plot look that different, which I will explain further below.

Sorry, we might have a misunderstanding here. I didn’t say that latent-scale analyses have a stronger theoretical basis for the log-normal distribution. It’s the latent projection which should have a stronger theoretical basis for the log-normal distribution (compared to the Weibull distribution)—at least in my opinion (I don’t know how @avehtari thinks about this). As I said above, it is generally better to perform response-scale analyses than latent-scale analyses. The reason is that on latent scale, we don’t have actual response observations to compute the performance statistics. Instead, the latent-scale response values are averages over the draw-wise latent-scale fitted values from the reference model, so they are artificial (not actual) response values. (To avoid a dependency between training and test data, this “definition” of latent-scale response values requires an adaptation in case of cv_method = "LOO" to incorporate the PSIS-LOO CV weights and is not readily applicable to cv_method = "kfold" at all, so in the latter case, these latent-scale response values are set to NAs.) Since the projection essentially consists of fitting to the fit of the reference model (here, by “fit of the reference model” we refer to response-scale fitted values in non-latent projection and to latent-scale fitted values in case of the latent projection), it is well possible that the latent-scale plots (which use the artificial latent-scale response values described above) look smoother and “better” than the response-scale plots.

So in case of the latent projection, I would generally prefer response-scale analyses over latent-scale analyses. (In case of non-latent projection, we don’t have that question in the first place—we always use response-scale analyses there.)

rtnliqry · May 12, 2025, 7:03pm

Ah, sorry, yes, I was conflating the latent projection with the latent scale. Thanks for the clarification and detailed explanation, this makes a lot more sense now. Thanks also for the continued time and effort into what is a longer thread than I expected!

Just to confirm my understanding (or lack thereof!): at what point does the censoring part of the model drop out of the analysis? I suppose it’s more of a question regarding the order in which things are processed and where latent_ll_oscale() fits into it, as it’s hard to ascertain from the source code itself.

[…] So it would be sufficient to compute the latent distribution for non-censored observation from Weibull and for censored from Bernoulli. And then make the projection using only the latent normal approximation.

I don’t fully understand how to achieve this through ref_predfun (if there’s some way to identify censored observations in the future), but I’ll have another read of the documentation and the default in the source code. I assume I could also try and emulate brms:::log_lik_censor() in a custom latent_ll_oscale() if the censored observations are identifiable.

I’ve implemented the extend_family() functions for the lognormal family, and the new dis and dis_resamp features seem to be working well. I’ve just been doing some analyses with uncensored simulated data and it seems that some of the weird cv_varsel() results might have been related to the censoring causing issues.

Edit: grammar

fweber144 · May 13, 2025, 7:53pm

In ref_predfun and in latent_ll_oscale. At least that’s my understanding of how censoring would need to be taken into account in projpred. Perhaps @avehtari knows more about this.

Here is how I imagine ref_predfun in pseudo code (but I’m not sure if this is correct; perhaps @avehtari could check this?):

ref_predfun <- function(fit, newdata = NULL) {
  out <- t(posterior_linpred(fit, newdata = newdata))
  
  if (is.null(newdata)) {
    newdata <- <original dataset>
  }
  observed_time <- newdata[, <column for observed timepoints>]
  
  invlink_function <- <get inverse link function (default for Weibull family is log) or use posterior_linpred() with transform = TRUE instead of invlink_function(out) below>
  mu_draws <- invlink_function(out)
  
  shape_draws <- as.matrix(fit)[, "shape"]
  
  for (idx_cens in <indices of censored observations in newdata>) {
    out[idx_cens, ] <- qlogis( # we could also use qnorm() for probit instead of logit
      pweibull(observed_time[idx_cens],
               shape = shape_draws,
               scale = mu_draws[idx_cens, ] / gamma(1 + 1 / shape_draws),
               lower.tail = FALSE)
    )
  }
  return(out)
}

For latent_ll_oscale (as well as latent_ilink and latent_ppd_oscale), I can try to write some pseudo code as soon as I have time for that, but perhaps @avehtari could provide pseudo code for latent_ll_oscale himself?

rtnliqry · May 20, 2025, 2:19pm

Thanks for this, @fweber144.

I had a go at writing some pseudocode for latent_ll_oscale and latent_ppd_oscale based on how brms:::log_lik_censor and brms:::rcontinuous do this for censored models, respectively (assuming I’ve understood how both of these work correctly). I’ve done this for the log-normal family just to make it easier to track how dis fits into it, rather than the aforementioned ad hoc use of the Weibull model’s shape draws. I assume latent_ilink is unaffected by censoring given that the inverse link function is the same for censored or uncensored observations? I’m by no means an expert R programmer, so I imagine there are faster ways for indexing/structuring the code. Note that idxs_cens is assumed to be a vector of column indices.

latent_ll_oscale_cens <- function(
    ilpreds 
    ,dis
    ,y_oscale
    ,wobs = rep(1, length(y_oscale))
    ,cl_ref
    ,idxs_cens
    ,wdraws_ref = rep(1, length(cl_ref))
) {
  
  y_oscale_mat <- matrix(
    y_oscale
    ,nrow = nrow(ilpreds)
    ,ncol = ncol(ilpreds)
    ,byrow = TRUE
  )
  
  wobs_mat <- matrix(
    wobs
    ,nrow = nrow(ilpreds)
    ,ncol = ncol(ilpreds)
    ,byrow = TRUE
  )
  
  ll_unw <- matrix(NA, nrow = nrow(ilpreds), ncol = ncol(ilpreds), byrow = TRUE)
  
  # y_oscale_mat and ilpreds are S x N matrices, so subset by column for now
  # Censored observations (based on brms:::log_lik_censor)
  ll_unw[, idxs_cens] <- plnorm(
    y_oscale_mat[, idxs_cens]
    ,meanlog = ilpreds[, idxs_cens]
    ,sdlog = dis
    ,lower.tail = FALSE
    ,log.p = TRUE
  )
  
  # Uncensored observations
  ll_unw[, -idxs_cens] <- dlnorm(
    y_oscale_mat[, -idxs_cens]
    ,meanlog = ilpreds[, -idxs_cens]
    ,sdlog = dis
    ,log = TRUE
  )
  
  return(wobs_mat * ll_unw)
  
}

latent_ppd_oscale_cens <- function(
    ilpreds_resamp
    ,dis_resamp
    ,wobs
    ,cl_ref
    ,wdraws_ref = rep(1, length(cl_ref))
    ,idxs_prjdraws
    ,idxs_cens
    ,newdata
) {
  
  if (is.null(newdata)) {
    newdata <- # <original dataset>
  }
  observed_times <- newdata[, <column for observed timepoints>]
  
  # Initialise S x N matrix
  ppd <- matrix(NA, nrow = nrow(ilpreds_resamp), ncol = ncol(ilpreds_resamp))
    
  for (i in <all observations in newdata>) {
    
    if (i %in% idxs_cens) {
     
      # Right-censored observations (based on brms:::rcontinuous)
      plb <- plnorm(-Inf, meanlog = ilpreds_resamp[, i], sdlog = dis_resamp)
      pub <- plnorm(observed_times[i], meanlog = ilpreds_resamp[, i], sdlog = dis_resamp)
      ppd[, i] <- runif(nrow(ilpreds_resamp), min = plb, max = pub) |> 
        qlnorm(meanlog = ilpreds_resamp[, i], sdlog = dis_resamp)
      
    } else {
      
      # Uncensored observations
      ppd[, i] <- rlnorm(
        nrow(ilpreds_resamp)
        ,meanlog = ilpreds_resamp[, i]
        ,sdlog = dis_resamp
      )
      
    }
    
  }
  
  return(ppd)
  
}

@avehtari, it would be great to know if I’m on the right track here - thanks in advance!

Edit: grammar

fweber144 · May 20, 2025, 8:59pm

My understanding of @avehtari’s idea is that latent_ilink would also have to be adapted. Here is pseudo-code for such a latent_ilink function in case of the Weibull family with the default log link (or any other family with log link where censoring should be taken into account):

latent_ilink_weib_cens <- function(
    lpreds,
    idxs_obs = seq_len(ncol(lpreds)),
    newdata = NULL,
    cl_ref,
    wdraws_ref = rep(1, length(cl_ref))
) {
  if (is.null(newdata)) {
    idxs_cens <- <indices of censored observations in original dataset after subsetting it with `idxs_obs`>
    idxs_event <- <indices of uncensored observations in original dataset after subsetting it with `idxs_obs`>
  } else {
    idxs_cens <- <indices of censored observations in newdata>
    idxs_event <- <indices of uncensored observations in newdata>
  }
  ilpreds <- matrix(nrow = nrow(lpreds), ncol = ncol(lpreds))
  ilpreds[, idxs_cens] <- plogis( # pnorm() in case of probit instead of logit (see `ref_predfun`)
    lpreds[, idxs_cens, drop = FALSE]
  )
  ilpreds[, idxs_event] <- exp(lpreds[, idxs_event, drop = FALSE])
  return(ilpreds)
}

Pseudo-code for the corresponding latent_ll_oscale function would be (in my understanding):

latent_ll_oscale_weib_cens <- function(
    ilpreds,
    dis = rep(NA, nrow(ilpreds)),
    y_oscale,
    wobs = rep(1, ncol(ilpreds)),
    idxs_obs = seq_len(ncol(ilpreds)),
    newdata = NULL,
    cl_ref,
    wdraws_ref = rep(1, length(cl_ref))
) {
  if (is.null(newdata)) {
    idxs_cens <- <indices of censored observations in original dataset after subsetting it with `idxs_obs`>
    idxs_event <- <indices of uncensored observations in original dataset after subsetting it with `idxs_obs`>
  } else {
    idxs_cens <- <indices of censored observations in newdata>
    idxs_event <- <indices of uncensored observations in newdata>
  }
  wobs_mat <- matrix(wobs, nrow = nrow(ilpreds), ncol = ncol(ilpreds),
                     byrow = TRUE)
  refm_shape_agg <- cl_agg(refm_shape, cl = cl_ref, wdraws = wdraws_ref)
  ll_unw <- matrix(nrow = nrow(ilpreds), ncol = ncol(ilpreds))
  for (idx_cens in idxs_cens) {
    # In this case, `ilpreds[, idx_cens]` contains probabilities (the upper-tail
    # integrals over the Weibull density from the censored timepoint on), so:
    ll_unw[, idx_cens] <- log(ilpreds[, idx_cens])
  }
  for (idx_event in idxs_event) {
    ll_unw[, idx_event] <- dweibull(
      y_oscale[idx_event],
      shape = refm_shape_agg,
      scale = ilpreds[, idx_event] / gamma(1 + 1 / as.vector(refm_shape_agg)),
      log = TRUE
    )
  }
  return(wobs_mat * ll_unw)
}

Before continuing with latent_ppd_oscale (for the Weibull family) and all of these 3 for the log-normal family, I’ll wait for feedback from @avehtari.

avehtari · May 23, 2025, 11:46am

Actually I was thinking something simpler, but only now realized why that would not work. I guess @fweber144 was thinking this correctly from the beginning, and probably was confused what I tried to say as it was not sensible.

Now I think we need to sample the missing censored observations from a truncated Weibull with the left truncation at the censoring time. Given the augmented observations, we can then use just the latent of Weibull (we can’t mix latent of Weibull and latent of Bernoulli).

I still need to think about this, but unfortunately a bit busy the rest of May, so I’m not certain whether I can confirm how this should be done before June.

Sorry for the confusion I have caused

rtnliqry · May 23, 2025, 3:39pm

Thanks @avehtari, much appreciated!

rtnliqry · May 29, 2025, 9:42am

@fweber144 I’ve just been thinking about this some more and how censored observations might be tracked within projpred. Rather than raw row/column indices for the censored observations, it’d probably be worth including the actual column/values from the data, so that users can carry through different types of censoring to pass through to the custom extend_family functions to be handled differently (e.g. the 0 = none, -1 = left, 1 = right, 2 = interval, that brms uses for resp_cens).

fweber144 · May 31, 2025, 9:51am

Yes, this is what I meant with

above. However, as noted above, this more general (and hence more desirable) solution is probably more tedious to implement.

Topic		Replies	Views
Projection predictive variable selection for discrete response families with finite support Publicity projpred	2	454	February 13, 2023
Projection predictive variable selection and robust regression Modeling projpred , brms	2	623	March 9, 2021
Projection predictive feature selection for multilevel phylogenetic models Interfaces projpred	8	1297	March 25, 2021
'asym_laplace' is not supported by projpred Modeling	3	336	February 13, 2023
Projpred: augmented and latent projection for hierarchical categorical models General projpred	3	55	May 26, 2025

Using {projpred} latent projection with {brms} Weibull family models

Related topics