Projpred: latent projection for hierarchical ordinal models

Hi everyone,

I am back with another question on projected variable selection for hierarchical models.

The developers kindly covered in detail different aspects of categorical models here. In the linked post, @fweber144 and @avehtari confirmed that the latent projection for hierarchical categorical models is not supported.

I am now looking at hierarchical ordinal models, and I wanted to check a few things.

When I create a hierarchical ordinal model using brms, perform appropriate posterior checks, create a reference model using refm_obj <- get_refmodel(ordinal.fit), and then run preliminary cv_varsel, I get a recommendation from projpred to use the latent projection. There is also the recommendation for hierarchical model to set nterms_max to n-1 of your predictors, which I did.

When I use refm_obj <- get_refmodel(ordinal.fit, latent =T), running the preliminary cv_varsel with the below code:

cvvs_fast_ordinal_multilevel <- cv_varsel(refm_obj, validate_search = FALSE, method = "forward", refit_prj = FALSE, nterms_max = 7)

produces a long series of warnings:

Warning messages:
1: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):

Warning in checkConv(attr(opt, “derivs”), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00212214 (tol = 0.002, component 1)

2: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
1 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via ... (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom div_minimizer function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) c("lmerMod"). Documentation for corresponding tuning parameters is linked in section “Draw-wise divergence minimizers” of ?`projpred-package`. Current submodel formula (right-hand side): ~trust_factor + voice_factor + transparency_factor + interpersonal_treatment_factor + (1 | respondent_id) + (1 | location)
3: In loo_varsel(refmodel = refmodel, method = method, nterms_max = nterms_max, :

4: In warn_pareto(n07 = sum(pareto_k > 0.7), n05 = sum(0.7 >= pareto_k & :
In the calculation of the reference model’s PSIS-LOO CV weights, 2 (out of 1739) Pareto k-values are > 0.7 and 53 (out of 1739) Pareto k-values are in the interval (0.5, 0.7]. Moment matching (see the loo package), mixture importance sampling (see the loo package), and reloo-ing (see the brms package) are not supported by projpred. If these techniques (run outside of projpred, i.e., for the reference model only; note that reloo-ing may be computationally costly) result in a markedly different reference model ELPD estimate than ordinary PSIS-LOO CV does, we recommend to use K-fold CV within projpred.
5: In warn_pareto(n07 = sum(y_lat_E$pareto_k > 0.7), n05 = sum(0.7 >= :
In the recalculation of the latent response values, 41 (out of 1739) expectation-specific Pareto k-values are > 0.7 and 109 (out of 1739) expectation-specific Pareto k-values are in the interval (0.5, 0.7]. In general, we recommend K-fold CV in this case.
6: In loo_varsel(refmodel = refmodel, method = method, nterms_max = nterms_max, :
The projected draws used for the performance evaluation have different (i.e., nonconstant) weights, so using standard importance sampling (SIS) instead of Pareto-smoothed importance sampling (PSIS). In general, PSIS is recommended over SIS.

So we have multiple issues. It seems that the submodels may not have converged and Pareto k-values are bad. Does anyone have any experience with this? I can run this projection without the hierarchical structure with no problems. However, the data is hierarchical and so it would be better to perform the variable selection on the hierarchical model.

I would also like to ask if anyone knows if it’s possible to adapt the code provided here by @fweber144 in the same post I linked above. The code shows how to obtain the observation-wise contributions to the GMPD for categorical models - it would be very helpful to have something similar to use as a post-projection posterior check for ordinal models.

Thank you so much for any assistance!

Hi Melissa,

Thank you for the detailed description of your problem.

  • Could you provide a two-way contingency table for respondent_id and location (i.e., for the combination of these two variables)?
  • Could you try to add control = lme4::lmerControl(optimizer = "Nelder_Mead") to the cv_varsel() call?
  • Could you switch to the most recent GitHub version of projpred? There have been some improvements with regard to the Pareto-k warnings and also with regard to the use of PSIS / SIS compared to v2.8.0.

I’ll try to write some code for this asap.

For me, the code from the categorical model worked for an ordinal model as well. I just had to change the reference model fit (and changed binwidth = 0.018 to binwidth = 0.008 to show all the dots from the dotplots; perhaps a different plot or a numeric summary such as quantile() would be better):

data(inhaler, package = "brms")
bfit <- brms::brm(
  formula = rating ~ period + carry + treat, # leaving out `(1 | subject)` here for illustrative purposes
  data = inhaler,
  family = brms::cumulative(),
  ### Only for the sake of speed (not recommended in general):
  iter = 500,
  chains = 2,
  ###
  refresh = 0,
  seed = 25569
)

library(projpred)
refm <- get_refmodel(bfit, brms_seed = 996644)
cvvs <- cv_varsel(
  refm,
  cv_method = "kfold", # or `cv_method = "loo"` (possibly in combination with `nloo` when using projpred's current GitHub version 2.8.0.9000)
  ### Only for the sake of speed (not recommended in general):
  K = 2,
  nclusters = 1,
  refit_prj = FALSE,
  ###
  seed = 1008792368
)

## Cross-validated predictive probabilities for the observed response categories
## (here abbreviated by "CV-PP"; these are the observation-wise contributions to
## the GMPD):

# For plots (if desired; this is just a suggestion):
library(ggplot2)
theme_set(theme_bw())
gg_cvpps <- function(cvpps_vec) {
  x_label_model <- if (identical(deparse(substitute(cvpps_vec)), "cvpps_ref")) {
    "reference model"
  } else if (identical(deparse(substitute(cvpps_vec)), "cvpps_sub")) {
    paste0("submodel of size ", nterms_sub)
  } else {
    stop("Unexpected `deparse(substitute(cvpps_vec))` value.")
  }
  return(ggplot(data = data.frame(cvpp = cvpps_vec), mapping = aes(x = cvpp)) +
           scale_x_continuous(labels = scales::label_percent()) +
           coord_cartesian(xlim = c(0, 1)) +
           labs(x = paste0("CV-PP from ", x_label_model)) +
           geom_dotplot(method = "histodot", binwidth = 0.008))
}

# Reference model:
cvlpps_ref <- cvvs[["summaries"]][["ref"]][["lppd"]]
cvpps_ref <- exp(cvlpps_ref)
print(gg_cvpps(cvpps_ref))

# Submodel of size `nterms_sub` (here, `nterms_sub <- 2` is an arbitrary choice;
# it is not based on a plot.vsel() inspection or any similar reasoning):
nterms_sub <- 2
cvlpps_sub <- cvvs[["summaries"]][["sub"]][[nterms_sub + 1]][["lppd"]]
cvpps_sub <- exp(cvlpps_sub)
print(gg_cvpps(cvpps_sub))

If this doesn’t work for you, let me know.

1 Like

For this, also note the following from the documentation of cv_varsel() (here from the most recent GitHub version, source code is at lines projpred/R/cv_varsel.R at 4efe82cf7b0ef78ce87b52045fc10208575d5f2d · stan-dev/projpred · GitHub):

If the reference model PSIS-LOO CV Pareto-\hat{k} values are good, but there are high Pareto-\hat{k} values for the projected models, you can try increasing the number of draws used for the PSIS-LOO CV (ndraws in case of refit_prj = FALSE; ndraws_pred in case of refit_prj = TRUE). If increasing the number of draws does not help and if the reference model PSIS-LOO CV Pareto-\hat{k} values are high, and the reference model PSIS-LOO CV results change substantially when using moment-matching, mixture importance sampling, or reloo-ing, we recommend to use K-fold CV within projpred.

Perhaps this helps as well?

Thank you very much for your quick reply, and sorry for my delayed response.

Could you switch to the most recent GitHub version of projpred? There have been some improvements with regard to the Pareto-k warnings and also with regard to the use of PSIS / SIS compared to v2.8.0.

Thanks for the tip, I have done so.

Could you provide a two-way contingency table for respondent_id and location (i.e., for the combination of these two variables)?

I decided to remove location as a random (grouping) factor for various reasons; respondent_id has 435 levels.

Could you try to add control = lme4::lmerControl(optimizer = "Nelder_Mead") to the cv_varsel() call?

I tried this and the cv_varsel call progressed. However, at the end, I received a long series warnings, see below for a selection. It seems from my limited understanding that the variable selection is too unstable for reliable inference.

1: In loo_varsel(refmodel = refmodel, method = method, nterms_max = nterms_max, 2: In warn_pareto(n07 = sum(pareto_k > 0.7), n05 = sum(0.7 >= ... : In the calculation of the reference model's PSIS-LOO CV weights, 3 (out of 1739) Pareto k-values are > 0.7 and 58 (out of 1739) Pareto k-values are in the interval (0.5, 0.7]. Moment matching (see the loo package), mixture importance sampling (see the loo package), and reloo-ing (see the brms package) are not supported by projpred. If these techniques (run outside of projpred, i.e., for the reference model only; note that reloo-ing may be computationally costly) result in a markedly different reference model ELPD estimate than ordinary PSIS-LOO CV does, we recommend to use K-fold CV within projpred. 3: In warn_pareto(n07 = sum(y_lat_E$pareto_k > 0.7), n05 = sum(0.7 >= : In the recalculation of the latent response values, 39 (out of 1739) expectation-specific Pareto k-values are > 0.7 and 118 (out of 1739) expectation-specific Pareto k-values are in the interval (0.5, 0.7]. In general, we recommend K-fold CV in this case. 4: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) : The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer): Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.00538927 (tol = 0.002, component 1) In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) : 1 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via … (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom div_minimizer function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) c("lmerMod"). Documentation for corresponding tuning parameters is linked in section Draw-wise divergence minimizers of `` ?projpred-package ``. Current submodel formula (right-hand side): ~trust_factor + voice_factor + transparency_factor + interpersonal_treatment_factor + accountability_factor + (1 | respondent_id)

If the reference model PSIS-LOO CV Pareto- values are good, but there are high Pareto- values for the projected models, you can try increasing the number of draws used for the PSIS-LOO CV (ndraws in case of refit_prj = FALSE; ndraws_pred in case of refit_prj = TRUE). If increasing the number of draws does not help and if the reference model PSIS-LOO CV Pareto- values are high, and the reference model PSIS-LOO CV results change substantially when using moment-matching, mixture importance sampling, or reloo-ing, we recommend to use -fold CV within projpred.

I received similar warnings when using k-fold CV. Given that the predictor ordering does not differ between the hierarchical model and the model with only fixed factors (ie a non-hierarchical model), I think I will present the results from the fixed factor model and demonstrate that they echo the unstable selection from the hierarchical model.

For me, the code from the categorical model worked for an ordinal model as well. I just had to change the reference model fit (and changed binwidth = 0.018 to binwidth = 0.008 to show all the dots from the dotplots; perhaps a different plot or a numeric summary such as quantile() would be better):

Thank you for looking at this again. Given that I have quite a large dataset, I just had to reduce the binwidth much more to get the right plot. Sorry for not seeing this myself!

And thank you again for your help! I can’t say enough good things about the support on this package (and many other packages), and have been singing your praises to all my colleagues :)

1 Like

These warning messages do not seem to come from the most recent GitHub version of projpred. Are you sure the installation from GitHub was successful?

Concerning the warning about failure of convergence: Is the reference model a cumulative one, i.e., using the brms::cumulative() family? If yes, are you able to fit the model

<original response variable> ~ trust_factor + voice_factor + transparency_factor + interpersonal_treatment_factor + accountability_factor + (1 | respondent_id)

using ordinal::clmm() outside of projpred? And could you try increasing nclusters in the cv_varsel() call, perhaps even switching to ndraws (i.e., setting nclusters = NULL and specifying for example ndraws = 400)? The warning message states that 1 out of 20 submodel fits was concerned, so it would be good to know how this changes when the number of clusters or (thinned) draws increases.

Sounds good.

No problem, glad to help :)

I’m just realizing that you marked this thread as resolved, so don’t feel obligated to try out what I suggested above.

I wanted to try out your suggestions in case others find it useful. I updated projpred again to version 2.9.0 from Github to be sure I am using the latest version.

Concerning the warning about failure of convergence: Is the reference model a cumulative one, i.e., using the brms::cumulative() family? If yes, are you able to fit the model using ordinal::clmm() outside of projpred?

<original response variable> ~ trust_factor + voice_factor + transparency_factor + interpersonal_treatment_factor + accountability_factor + (1 | respondent_id)

Yes, I can fit the model with no issues.

And could you try increasing nclusters in the cv_varsel() call, perhaps even switching to ndraws (i.e., setting nclusters = NULL and specifying for example ndraws = 400)? The warning message states that 1 out of 20 submodel fits was concerned, so it would be good to know how this changes when the number of clusters or (thinned) draws increases.

When I do a preliminary cv_varsel run using:

cvvs_fast_ordinal_multilevel_nclusters <- cv_varsel(
  refm_obj.latent,
  validate_search = FALSE,
  method = "forward",
  ### Only for the sake of speed (not recommended in general):
  refit_prj = FALSE,
  nterms_max = 8,
  nclusters = NULL,
  ndraws = 400,
  control = lme4::lmerControl(optimizer = "Nelder_Mead")
)

I get the warning message:

Error in refmodel$family$latent_ll_oscale(mu_offs_oscale, dis = refmodel$dis,  : 
  unused argument (dis = refmodel$dis)

Release versions (i.e., without the .9000 suffix in the version number) are the same on GitHub and on CRAN, so you could also use the CRAN version.

Ok, thanks.

It looks like you haven’t updated your latent_ll_oscale function as instructed in the v2.9.0 release notes:

  • For the latent projection, the function passed to argument latent_ll_oscale of extend_family() now needs to have an argument dis (at the second position). Similarly, the function passed to argument latent_ppd_oscale of extend_family() now needs to have an argument dis_resamp (again at the second position). This makes it possible, e.g., to use the latent projection for a log-normal response family. (GitHub: #513)

However, this has been around in v2.8.0.9000 for a while, so it seems like you were really using v2.8.0 above.