Projpred with brms object

nlpacestan · January 1, 2022, 12:03am

I have a glmm bernoulli brmsfit object. Calling projpred::varsel or cv_varsel gives an error message and no results.

The model code is:

bprior ← c(prior(student_t(3, 0, 2.5), class = ‘b’))

fit1.brmsfit ←
brm(Hypoglycemic60 ~
AgeClass +
Sex +
BMIClass +
ASA2Class +
# CaseYear +
Emergency +
AdmissionType +
AnesthesiaIntervalClass +
BypassDuration +
ElixCount +
IntraopVasoplegia +

IntraopNutrition +

IntraopHypoglycemic +
IntraopHypotensive +
IntraopInotropeVasopressor +
IntraopBetaBlocker +
EBLClass +
PRBCClass +
(1 | MPOGInstitutionID)

,
data = tmp.df,
family = ‘bernoulli’,
prior = bprior,
chains = 4,
seed = 9876,
iter = 2000,
control = list(
max_treedepth = 40)
)

fit1.brmsfit
Family: bernoulli
Links: mu = logit
Formula: Hypoglycemic60 ~ AgeClass + Sex + BMIClass + ASA2Class + Emergency + AdmissionType + AnesthesiaIntervalClass + BypassDuration + ElixCount + IntraopVasoplegia + IntraopHypoglycemic + IntraopHypotensive + IntraopInotropeVasopressor + IntraopBetaBlocker + EBLClass + PRBCClass + (1 | MPOGInstitutionID)
Data: tmp.df (Number of observations: 72610)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000

Group-Level Effects:
~MPOGInstitutionID (Number of levels: 29)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.63 0.14 0.40 0.95 1.00 1682 2464

The model fitting is OK (Rhat, ESS).

There are a few slightly high k estimates.

Computed from 4000 by 72610 log-likelihood matrix

     Estimate    SE

elpd_loo -1420.7 80.8
p_loo 44.1 3.2
looic 2841.4 161.6
Monte Carlo SE of elpd_loo is 0.1.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 72602 100.0% 1328
(0.5, 0.7] (ok) 8 0.0% 3147
(0.7, 1] (bad) 0 0.0%
(1, Inf) (very bad) 0 0.0%

All Pareto k estimates are ok (k < 0.7).
See help(‘pareto-k-diagnostic’) for details.

The data frame tmp.df has no missing values.

The versions are brms 2.16.3, rstan 2.21.3 and projpred 2.0.2.

fit1.vsel ← varsel(fit1.brmsfit)
Error in model.frame.default(data = data, weights = weights, drop.unused.levels = TRUE, :
object is not a matrix

fit1.vsel ← cv_varsel(fit1.brmsfit)
Warning in cv_varsel.refmodel(refmodel, …) :
K provided, but cv_method is LOO.
Warning: Some Pareto k diagnostic values are slightly high. See help(‘pareto-k-diagnostic’) for details.

[1] “Computing LOOs…”
| | 0%Error in model.frame.default(data = data, weights = weights, drop.unused.levels = TRUE, :
object is not a matrix

I have searched and haven’t found an explanation for this error message.

Nathan

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] projpred_2.0.2 mice_3.14.0 rstan_2.21.3 StanHeaders_2.21.0-7 loo_2.4.1 Hmisc_4.6-0 Formula_1.2-4 lattice_0.20-44
[9] skimr_2.1.3 readit_1.0.0 brms_2.16.3 Rcpp_1.0.7 stevemisc_1.3.0 broom.mixed_0.2.7 broom_0.7.10 sjlabelled_1.1.8
[17] sjPlot_2.8.10 lme4_1.1-27.1 Matrix_1.3-4 Epi_2.44 cobalt_4.3.1 MatchIt_4.3.2 logistf_1.24 sjmisc_2.8.9
[25] collapse_1.6.5 car_3.0-12 carData_3.0-4 coin_1.4-2 survival_3.2-13 arsenal_3.6.3 tableone_0.13.0 tables_0.9.6
[33] lubridate_1.8.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6
[41] ggplot2_3.3.5 tidyverse_1.3.1 magrittr_2.0.1 tinytex_0.36 rmarkdown_2.11 knitr_1.37

avehtari · January 10, 2022, 8:39am

ping @AlejandroCatalina @paul.buerkner

paul.buerkner · January 10, 2022, 8:47am

could you try it with the lastest projpred developmental version to be installed via remotes::install_github("stan-dev/projpred", ref = "develop")

nlpacestan · January 21, 2022, 1:32am

Thanks for pointing me to the version on github.

projpred started without error message.

However, no projpred result can be obtained after various attempts.

My glmm model has about 15 covariates with 30 predictors. The sample size is 72610, but the number of events is 226. I used a t prior.

All the model estimation properties are ok.

Running cv_varsel(model) crashes R on the linux server (134 GB RAM).

I assume that this is memory management problem.

I was able to obtain a reference model using get_refmodel. The object size is 13.8 GB.

Running cv_varsel on the reference model also crashed the R session.

I must assume that my data set with a sparse outcome is not suitable for projective prediction.

Perhaps this is an example of don’t try models unless there are 10 or more events for each predictor.

paul.buerkner · January 21, 2022, 9:04am

Hmm, interesting. I don’t really know what exactly happens. @AlejandroCatalina do you have an intuition?

AlejandroCatalina · January 21, 2022, 9:15am

Well I can assume projpred is trying to project too many posterior draws, can you try with cv_varsel(model, ndraws=10, ndraws_pred=10) or other equally small numbers?

fweber144 · January 22, 2022, 8:11pm

Just skimmed quickly through this and as far as I understand @nlpacestan’s code, there are 72610 observations here. Calling cv_varsel() with default arguments (as in @nlpacestan’s code) would then result in (PSIS-)LOO CV with a validated search, i.e., 72610 searches are run. That’s pretty much and could cause a memory issue. K-fold CV could be a remedy, or (PSIS-)LOO CV without a validated search (but that has the downside of, well, an unvalidated search).

fweber144 · January 22, 2022, 8:19pm

Note that for multilevel reference models fit in brms, the K-fold CV in projpred can (but doesn’t have to necessarily) lead to an error which I’m about to submit as an issue on GitHub soon.

EDIT: That issue may now be found here.

nlpacestan · January 24, 2022, 10:22pm

I followed this suggestion.

tmp.cvsel ← cv_varsel(fit1.brmsfit, ndraws=10, ndraws_pred=10)

Warning: Some Pareto k diagnostic values are slightly high. See help(‘pareto-k-diagnostic’) for details.

[1] “Repeating forward search for 72610 LOO folds…”

After 36 hours, the walltime on the server expired.

There was no object returned.

???

Nathan

AlejandroCatalina · January 25, 2022, 4:34am

Yeah it might be better in this case to avoid full LOO, you can start by running the same arguments with varsel only instead of cv_varsel.

Best,
Alex

nlpacestan · January 28, 2022, 12:49am

I successfully ran using vsel(object, ndraws = 10, ndraws_pred = 10).

Family: binomial
Link function: logit

Formula: Hypoglycemic60 ~ AgeClass + Sex + BMIClass + ASA2Class + Emergency +
AdmissionType + AnesthesiaIntervalClass + BypassDuration +
ElixCount + IntraopVasoplegia + IntraopHypoglycemic + IntraopHypotensive +
IntraopInotropeVasopressor + IntraopBetaBlocker + EBLClass +
PRBCClass + (1 | MPOGInstitutionID)
Observations: 72610
Search method: forward, maximum number of terms 17
Number of clusters used for selection: 10
Number of clusters used for prediction: 10
Suggested Projection Size: 17

Selection Summary:
size solution_terms elpd se diff diff.se
0 -1530.2 86.5 -153.6 18.4
1 AgeClass -1499.6 84.9 -123.0 15.4
2 ElixCount -1477.3 83.7 -100.8 13.5
3 BMIClass -1447.6 82.0 -71.1 10.4
4 BypassDuration -1436.3 81.5 -59.7 9.2
5 Sex -1427.7 81.0 -51.1 8.5
6 EBLClass -1425.3 80.9 -48.7 8.1
7 IntraopHypoglycemic -1423.0 80.8 -46.4 7.7
8 ASA2Class -1420.4 80.7 -43.8 7.6
9 IntraopHypotensive -1418.7 80.6 -42.1 7.6
10 IntraopVasoplegia -1417.6 80.5 -41.0 7.4
11 Emergency -1417.0 80.5 -40.4 7.3
12 AdmissionType -1416.5 80.5 -39.9 7.3
13 PRBCClass -1416.1 80.5 -39.5 7.4
14 AnesthesiaIntervalClass -1415.7 80.4 -39.1 7.3
15 IntraopBetaBlocker -1415.5 80.4 -38.9 7.4
16 IntraopInotropeVasopressor -1415.5 80.4 -38.9 7.4
17 (1 | MPOGInstitutionID) -8145.9 541.0 -6769.3 464.9

solution_terms(tmp.vsel)
[1] “AgeClass” “ElixCount” “BMIClass” “BypassDuration”
[5] “Sex” “EBLClass” “IntraopHypoglycemic” “ASA2Class”
[9] “IntraopHypotensive” “IntraopVasoplegia” “Emergency” “AdmissionType”
[13] “PRBCClass” “AnesthesiaIntervalClass” “IntraopBetaBlocker” “IntraopInotropeVasopressor”
[17] “(1 | MPOGInstitutionID)”
suggest_size(tmp.vsel)
[1] 17

There are 17 terms in the model. The model fit met the usual criteria. No terms can be removed from the model by projpred.

The plot of the vsel object shows elpd increasing with each term until the 17th term, the group intercept parameters. Then the elpd drops by about 6800. The grouping factor has 29 levels; there are 72610 observations, but only 226 events. Is this drop in the elpd with the 17th term expected?

I had attempted frequentist glmer models first, but couldn’t get stable estimations including boundary problems.

I was hoping that by using some informative priors (used student t) I could find some relationships. But it appears that the signal is too weak .

I appreciate the help from the experts.

Nathan

fweber144 · February 1, 2022, 8:29am

ndraws_pred = 10 seems quite low to me. I would recommend starting with the defaults for ndraws and ndraws_pred. But you need the development version of projpred (GitHub branch develop) to use the updated defaults. Apart from that, I would recommend the development version in any case.

As far as I know, the fact that a frequentist GLMM has boundary issues hints at a hierarchical standard deviation close to zero. That could perhaps explain the dropping ELPD when the group-level term comes in: If the actual hierarchical standard deviation is zero, its (marginal) posterior in the Bayesian model will still have all of its probability mass on the positive real line (because of the nature of MCMC sampling).

P.S.: Could you perhaps use code formatting for code and code output? That’s one backtick at each side of inline code and three backticks above and below multi-line code.

Topic		Replies	Views
Brms/projpred problem brms	3	905	January 27, 2021
Get_refmodel.brmsfit {brms} not working on zero_inflated_negbinomial General projpred , brms	3	598	February 4, 2021
Projpred, brms, and categorical variables Modeling projpred , brms	5	890	July 13, 2021
I cannot seem to make varsel() work from brms General projpred , brms	2	583	March 9, 2023
Brm_multiple object submitted to projpred Modeling	4	468	May 26, 2020

Projpred with brms object

IntraopNutrition +

Related topics