Errors running cv_varsel , and a search terms question

M-J-Mitchell · June 8, 2023, 4:28pm

Hi all,

I am a PhD student using brms to run mixed models to understand the effect of parental factors on the fertility of eggs. My model currently looks like this - understanding the effect ion maternal and paternal ages, ages as quadratic effects, inbreeding coefficients and interactions between age and inbreeding coefficient for each parent.

model_dev <- brm(Fertile ~ 
	#linear fixed effects   
                Pair.Inbreeding + 
	#quadratic fixed effects
                   I(Dam.age.years^2) + I(Sire.age.years^2) +
	#interactive fixed effects
                   Sire.Inbreeding*Sire.age.years + Dam.Inbreeding*Dam.age.years +
	#random effects
                   (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                   (1|Lay.Year) + (1|Location),
                 data = data_dev_sc,
                 family = "bernoulli",
                 prior = priors_dev,
                 iter = 30000,
                 control=list(adapt_delta=0.99),
                 cores = 8,
                 sample_prior = TRUE,
                 seed = 13579)

Based on other research and with the help of other forum members, I have now successfully set my priors are as follows:

priors_dev <- c(set_prior("normal(0,5)", class = "b", coef = "Dam.age.years"),
            set_prior("normal(-5,5)", class = "b", coef = "Sire.age.years"),
            set_prior("normal(0,10)", class = "b", coef = "Sire.Inbreeding"),
            set_prior("normal(0,10)", class = "b", coef = "Dam.Inbreeding"),
            set_prior("normal(-40,20)", class = "b", coef = "Pair.Inbreeding"),
            set_prior("normal(0,5)", class = "b", coef = "IDam.age.yearsE2"),
            set_prior("normal(0.5,5)", class = "b", coef = “ISire.age.yearsE2"))

After running the model, I am wanting to do variable selection using projpred’s cv_varsel, but have been hitting the following error:

Error in updateGlmerDevfun(devfun, glmod$reTrms, nAGQ = nAGQ) : 
  nAGQ > 1 is only available for models with a single, scalar random-effects term

I am following the methods of a paper which uses ‘search_terms’ argument of cv_varsel to ensure that the fixed effects are explored before the random effects and are included in every model. As a result my search_terms currently look like this:

search_terms<-c(
"1",
"Pair.Inbreeding",
"Dam.age.years",
"Sire.age.years",
"Dam.Inbreeding",
"Sire.Inbreeding",
"Dam.age.years+Dam.Inbreeding+Dam.age.years:Dam.Inbreeding",
"Sire.age.years+Sire.Inbreeding+Sire.age.years:Sire.Inbreeding",
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Sire.Studbook.ID)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Dam.Studbook.ID)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Lay.Year)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Location)"
)
)

Is there a better way to do this? My aim is simply to prioritise the fixed terms over the random terms when the variable selection is happening.

My cv_varsel code is as follows:

cvvs_dev <- cv_varsel( 
  ref_dev,
  validate_search = TRUE,
### prioritise fixed effects first
  search_terms = search_terms,
  refit_prj=FALSE,
  nclusters_pred = 20,
  seed = 13579
)

Please see full code below to simulate equivalent data to recreate the error.

Many thanks!

library(tidyverse)
library(brms)
library(projpred)

set.seed(13579)
predictor_mean <- 0
predictor_sd <- .5
Pair.Inbreeding <- Dam.age.years <- Sire.age.years <- Dam.Inbreeding <- Sire.Inbreeding <- rnorm(539, mean = predictor_mean, sd = predictor_sd)
Fertile <- as.factor(sample(c("Y","N"), size = 539, replace = T))
Dam.Studbook.ID <- as.factor(sample(c(1:52), size=539, replace = T))
Sire.Studbook.ID <- as.factor(sample(c(53:104), size=539, replace = T))
Lay.Year <- as.factor(sample(c(1988:2019), size = 539, replace = T))
Location <- as.factor(sample(c(1:12), size = 539, replace = T))

data_dev_sc <- tibble(
    Fertile,
    Dam.age.years,
    Sire.age.years,
    Dam.Inbreeding,
    Sire.Inbreeding,
    Pair.Inbreeding,
    Sire.Studbook.ID,
    Dam.Studbook.ID,
    Lay.Year,
    Location
)

# Set egg_dev priors
priors_dev <- c(
    set_prior("normal(0,5)", class = "b", coef = "Dam.age.years"),
    set_prior("normal(-5,5)", class = "b", coef = "Sire.age.years"),
    set_prior("normal(0,10)", class = "b", coef = "Sire.Inbreeding"),
    set_prior("normal(0,10)", class = "b", coef = "Dam.Inbreeding"),
    set_prior("normal(-40,20)", class = "b", coef = "Pair.Inbreeding"),
    set_prior("normal(0,5)", class = "b", coef = "IDam.age.yearsE2"),
    set_prior("normal(0.5,5)", class = "b", coef = "ISire.age.yearsE2")
)

model_dev <- brm(Fertile ~ 
                   Pair.Inbreeding + 
                   I(Dam.age.years^2) + I(Sire.age.years^2) +
                   Sire.Inbreeding*Sire.age.years + Dam.Inbreeding*Dam.age.years +
                   (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                   (1|Lay.Year) + (1|Location),
                 data = data_dev_sc,
                 family = "bernoulli",
                 prior = priors_dev,
                 iter = 30000,
                 control=list(adapt_delta=0.99),
                 cores = 8,
                 sample_prior = TRUE,
                 seed = 13579)
ref_dev <- brms:::get_refmodel.brmsfit(model_dev, seed = 13579)

search_terms <- c(
    "1",
    "Pair.Inbreeding",
    "Dam.age.years",
    "Sire.age.years",
    "Dam.Inbreeding",
    "Sire.Inbreeding",
    "Dam.age.years+Dam.Inbreeding+Dam.age.years:Dam.Inbreeding",
    "Sire.age.years+Sire.Inbreeding+Sire.age.years:Sire.Inbreeding",
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Sire.Studbook.ID)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Dam.Studbook.ID)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Lay.Year)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Location)"
    )
)

cvvs_dev <- cv_varsel(
    ref_dev,
    ### Only for the sake of speed (not recommended in general):
    validate_search = TRUE,
    ### prioritise fixed effects first
    search_terms = search_terms,
    nclusters_pred = 20,
    ### nterms_max = 12,
    seed = 13579
)

fweber144 · June 8, 2023, 8:07pm

The nAGQ error is from package lme4 which is used by projpred for performing the projection (namely by fitting an lme4 model to “the” fit of the reference model). It is caused by projpred encountering an lme4 convergence error and then trying to avoid that automatically by increasing argument nAGQ of lme4::glmer(). You could now try to play around with the lme4 tuning parameters (see lme4::glmer()) until the error disappears (without compromising the convergence of the estimation algorithm used by lme4). For that, I guess you would have to adapt projpred:::divmin() to replace the automatic nAGQ adjustment in these lines by something which works in your case. The adapted projpred:::divmin() function then needs to be passed to argument div_minimizer of init_refmodel() (via brms:::get_refmodel.brmsfit() in your case). However, instead of taking this tedious way directly, I think you should try the latent projection (Latent projection predictive feature selection • projpred) first. The latent projection might be able to avoid the lme4 convergence error underlying the nAGQ error and in general, it is also able to avoid this lme4 issue. However, the latent projection is only an approximate solution.

Concerning your search_terms question:

Is there a better way to do this?

Unfortunately, there is no better way at the moment. For the future, we plan some enhancements (see also Fixing certain terms to be included in all submodels in cv_varsel in a high dimensional setting · Issue #346 · stan-dev/projpred · GitHub), but we haven’t made it yet to implement these.

Some minor remarks:

In cv_varsel(), you are using refit_prj=FALSE and nclusters_pred = 20. The nclusters_pred = 20 part is not necessary and can be omitted because you are using the default of nclusters = 20 for the search part. Apart from that, for final results, your nclusters / refit_prj / nclusters_pred setting seems quite rough. See also my answer at Projpred: Fixing Group Effects in Search Terms and Tips for Speed? - #2 by fweber144.
In the brm() call, you have the predictor terms I(Dam.age.years^2) and I(Sire.age.years^2). These should now be supported correctly by projpred (recently, there have been some fixes for such terms), but in general (also outside of projpred), such special terms are always a bit more dangerous than pre-calculating the corresponding predictor data, appending it as new columns to the dataset, and then using these pre-calculated predictor columns as “usual” predictor terms in the model formula.

M-J-Mitchell · June 21, 2023, 9:21am

Thanks for this! Using the latent projection enabled cv_varsel to complete. However, when following the guide you kindly provided, I am hitting a new error when trying to plot the outcome of cv_varsel, as follows:

plot(cvvs_dev, stats = "mlpd", deltas = TRUE)

gives the error:

`latent_ilink` returned only `NA`s, so all performance statistics will also be `NA` as long as `resp_oscale = TRUE`.
`latent_ilink` returned only `NA`s, so all performance statistics will also be `NA` as long as `resp_oscale = TRUE`.
Error in out[["foldwise"]][, seq_len(nterms_max), drop = FALSE] : 
  subscript out of bounds

Thanks again for your continued support.

fweber144 · June 21, 2023, 2:35pm

Thank you for reporting this. Could you provide a reprex on projpred’s issue tracker, also mentioning which projpred version you used for creating cvvs_dev as well as for that plot() call?

M-J-Mitchell · January 21, 2024, 2:27pm

Hi again,

I attempted to produce a reprex, but my generated data unfortunately did not manage to reproduce the problem. I have since simplified my models to use a column with squared values in rather than a squared term, as you suggested.

I am now facing a new error (projpred 2.8.0; lme4 1.1-35.1; Matrix downgraded to 1.6-1) when using the latent projection as you suggested:

Error in eigen(sigma, symmetric = TRUE) : 
  infinite or missing values in 'x'

It appears after the following line in the terminal and aborts when it reaches 20%.

Running the search and the performance evaluation with `refit_prj = TRUE` for each of the K = 5 CV folds separately ...

This is a new error that I have begun experiencing since updating projpred and R and then downgrading Matrix to fix the “Error in initializePtr() : function ‘cholmod_factor_ldetA’ not provided by package ‘Matrix’” error.

Also, the following warning appears 15 times (with a few variations in the number of negative eigenvalues and the values themselves):

Warning messages:
1: Quick-TRANSfer stage steps exceeded maximum (= 3000000)
2: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 1.90098 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Cholmod warning 'not positive definite' at file '../Cholesky/t_cholmod_rowfac.c', line 430
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.0119061 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0167231 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.04211 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.538126 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0138167 (tol [... truncated]
3: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
4: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  Model failed to converge with max|grad| = 0.00528954 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00651383 (tol = 0.002, component 1)
  Cholmod warning 'not positive definite' at file '../Cholesky/t_cholmod_rowfac.c', line 430
  Model failed to converge with max|grad| = 0.0095549 (tol = 0.002, component 1)
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.417913 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.757001 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.75866 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0942 [... truncated]
5: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
6: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 0.0338797 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.104824 (tol = 0.002, component 1)
---
7: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  144 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID)
8: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 3 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
boundary (singular) fit: see help('isSingular')
  Model failed to converge with max|grad| = 0.00833148 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.146391 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
---
9: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  188 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Location)
10: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  Model failed to converge with max|grad| = 0.00220589 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00282146 (tol = 0.002, component 1)
---
11: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  2 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Location)
12: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.110325 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 1.53966 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00835859 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00307305 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00254743 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00313908 (tol = 0.002, component 1)
---
13: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
14: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 0.00475809 (tol = 0.002, component 1)
---
15: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  89 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID)
16: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.00300246 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
  Model failed to converge with max|grad| = 0.0156264 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0124273 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.30269 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.0031246 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00375001 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00323926 (tol = 0.002, component 1)
  Model fai [... truncated]
17: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  800 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)

I am trying to take advantage of the new “force_search_term()” function to force the model to select each fixed term, in any combination, before the random terms. Attached is the code I am using from brms to model projection.

model_dev <- brm(Fertile ~ 
                       Dam.age.years + Dam.age.sq +
                       Sire.age.years + Sire.age.sq +
                       Sire.Inbreeding + Pair.Inbreeding +
                       (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                       (1|Management) + (1|Location),
                     data = data_dev_sc,
                     family = "bernoulli",
                     prior = NULL,
                     iter = 30000,
                     control=list(adapt_delta=0.99),
                     cores = 8,
                     sample_prior = TRUE,
                     seed = 13579)

ref_dev <- brms:::get_refmodel.brmsfit(model_dev, seed = 13579, latent= TRUE, latent_y_unqs  = c("0","1"))

search_terms_forced <- force_search_terms(
  forced_terms = c("Dam.age.years",
                   "Dam.age.sq",
                   "Sire.age.years",
                   "Sire.age.sq",
                   "Sire.Inbreeding",
                   "Pair.Inbreeding"),
  optional_terms = c("(1|Sire.Studbook.ID)",
                     "(1|Dam.Studbook.ID)",
                     "(1|Management)",
                     "(1|Location)")
)
search_terms_full <- c("Dam.age.years",
                       "Dam.age.sq",
                       "Sire.age.years",
                       "Sire.age.sq",
                       "Sire.Inbreeding",
                       "Pair.Inbreeding", search_terms_forced)

cvvs_dev <- cv_varsel( 
  ref_dev,
  ### prioritise fixed effects first
  search_terms = search_terms_full,
  refit_prj = TRUE,
  ndraws_pred = 800,
  # nclusters_pred = 20,
  nterms_max = 9,
  seed = 13579,
  method = "forward",
  cv_method = "kfold",
  validate_search=TRUE
)

Reducing ntermsmax as suggested does remove that particular error but the others remain.

For the sake of clarity, ‘search_terms_full’ returns the following:

 [1] "Dam.age.years"                                                                                                       
 [2] "Dam.age.sq"                                                                                                          
 [3] "Sire.age.years"                                                                                                      
 [4] "Sire.age.sq"                                                                                                         
 [5] "Sire.Inbreeding"                                                                                                     
 [6] "Pair.Inbreeding"                                                                                                     
 [7] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding"                       
 [8] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Sire.Studbook.ID)"
 [9] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Dam.Studbook.ID)" 
[10] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Management)"      
[11] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Location)"

Does this do as I intend it to or am I making it over complicated? Does this have the potential to cause the above error?

Many thanks!

fweber144 · January 22, 2024, 8:46pm

This sounds again like an error during some lme4 fit, analogously to what I explained above. The warnings (except for the Quick-TRANSfer warning) are probably related to this. So you could play around with the lme4 tuning parameters (see lme4::glmer()) and see if that resolves the issue (of course, your adjustments of the tuning parameters should not compromise the convergence of the estimation algorithm used by lme4).

The warning

Quick-TRANSfer stage steps exceeded maximum (= 3000000)

seems to be due to the K-means algorithm used internally by projpred for clustering the posterior draws (see, e.g., here). This in turn might be related to the large number of posterior draws that you have. Is it possible to reduce the number of posterior draws in your case?

In your specific case, search_terms_full should be fine as you defined it.

M-J-Mitchell · June 12, 2024, 2:31pm

Hi all,

My supervisors and I have now successfully managed to get a cv_varsel to run on some models (with convergence warnings, however) but on others are getting this warning:

Error in eigen(sigma, symmetric = TRUE) : 
  infinite or missing values in 'x'
In addition: There were 21 warnings (use warnings() to see them)

I would appreciate any insight anyone may have. My attempts at creating a reprex have previously been unsuccessful but I am potentially able to share data via private message if you were happy to have a look and having the code would be useful to you.

Many thanks

fweber144 · June 12, 2024, 6:25pm

This sounds a bit like Projpred: Non-Convergence of Predictive Performance with Reference Model. I guess you still have multilevel submodels? If yes, could you try specifying control = lme4::lmerControl(optimizer = "Nelder_Mead") in the cv_varsel() call?

M-J-Mitchell · June 13, 2024, 8:37am

Hi Frank,

Thanks for this - we are using Nelder Mead optimiser; our current model and cv_varsel looks like this:

model_dev_pair_AI <- brm(Fertile ~ 0 +
                      Dam.age.years + dam_age_sq_sc+
                      Sire.age.years + sire_age_sq_sc +
                      Sire.Inbreeding + Pair.Inbreeding +
                      Artificial.incubation +
                      (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                      (1|Management) + (1|Location),
                    data = data_dev_sc_AI,
                    family = "bernoulli",
                    prior = NULL,
                    iter = 5000,
                    control=list(adapt_delta=0.99),
                    cores = 8,
                    sample_prior = TRUE,
                    seed = 13579)

refm_kfold <- kfold(model_dev_pair_AI, K = 5, save_fits = TRUE)
cvfits_crr <- structure(
  list(fits = refm_kfold$fits[, "fit"]),
  K = length(refm_kfold$fits[, "fit"]),
  folds = sapply(seq_len(nrow(model_dev_pair_AI$data)), function(ii) {
    which(sapply(refm_kfold$fits[, "omitted"], "%in%", x = ii))
  })
)

ref_dev_pair_AI <- brms:::get_refmodel.brmsfit(model_dev_pair_AI, seed = 13579, latent= TRUE, 
                                                   latent_y_unqs  = c("0","1"),
                                                   cvfits = cvfits_crr, 
                                                   control = lme4::lmerControl(optimizer = "nloptwrap",
                                                                               optCtrl = list(algorithm = "NLOPT_LN_NELDERMEAD",
                                                                                              xtol_abs = 1e-08,                                                                
                                                                                              ftol_abs = 1e-08,                                                 
                                                                                              maxeval = 1e+03)))

cvvs_dev_pair_AI <- cv_varsel( #### fails 
  ref_dev_pair_AI,
  ### prioritise fixed effects first
  search_terms = search_terms_full_AI,
  # refit_prj = TRUE,
  ndraws_pred = 800,
  # nclusters_pred = 20,
  nterms_max = 10,
  seed = 13579,
  method = "forward",
  cv_method = "kfold",
  validate_search=TRUE,
  search_control = lme4::lmerControl(optimizer = "Nelder Mead",
                                     check.conv.grad     = .makeCC("message", tol = 2e-3, relTol = NULL),
                                     check.conv.singular = .makeCC(action = "message", tol = formals(isSingular)$tol),
                                     check.conv.hess     = .makeCC(action = "message", tol = 1e-6),
                                     optCtrl = list(algorithm = "NLOPT_LN_NELDERMEAD",
                                                    ftol_abs=1e-8,
                                                    xtol_abs=1e-8)))

Thanks again

fweber144 · June 13, 2024, 6:57pm

Argument search_control only controls the search (and, by the way, you would need search_control = list(control = lme4::lmerControl(optimizer = "Nelder_Mead")); I know this is clunky but any simplification would be hard to implement). Try using control = lme4::lmerControl(optimizer = "Nelder_Mead") instead of search_control = [...]. That way, the Nelder-Mead optimizer should be used during the forward search phase as well as during the re-projections for the performance evaluation phase.

Besides, in the brms:::get_refmodel.brmsfit() call, you should not need the control = [...] part.

M-J-Mitchell · June 13, 2024, 9:01pm

Thanks for the suggestion and the extra information. Unfortunately

cvvs_dev_pair_AI <- cv_varsel( 
  ref_dev_pair_AI,
  ### prioritise fixed effects first
  search_terms = search_terms_full_AI,
  # refit_prj = TRUE,
  ndraws_pred = 800,
  # nclusters_pred = 20,
  nterms_max = 10,
  seed = 13579,
  method = "forward",
  cv_method = "kfold",
  validate_search=TRUE,
  control = lme4::lmerControl(optimizer = "Nelder_Mead"))

Still gives me

Error in eigen(sigma, symmetric = TRUE) : 
  infinite or missing values in 'x'
In addition: There were 15 warnings (use warnings() to see them)

Any other ideas?

fweber144 · June 15, 2024, 6:50pm

Could you try this without search_terms = search_terms_full_AI?

M-J-Mitchell · June 17, 2024, 12:53pm

Thanks - but unfortunately still hitting the same issues

fweber144 · June 17, 2024, 5:31pm

Ok, could you perhaps send me the code and the data via private message? Then I can try out a few different things.

fweber144 · June 23, 2024, 3:39pm

Sorry, I forgot to mention that if you need to switch to the Nelder-Mead algorithm for the lme4 submodel fits, you can use control = lme4::lmerControl(optimizer = "Nelder_Mead") in case of the latent projection (as described above), but in case of the traditional (non-latent) projection, you would need control = lme4::glmerControl(optimizer = "Nelder_Mead").

Topic		Replies	Views
Errors/warnings with loo::cv_varsel() on large brms fit in R Modeling techniques , loo , brms	11	1366	January 14, 2022
Projpred with brms object General	11	1321	February 1, 2022
Projpred::cv_varsel() returning "Not enough (non-NA) data to do anything meaningful" error brms	6	1458	November 6, 2021
Accounting for measurement error during variable selection with projpred (possibly with rstanarm or brms?) Modeling	25	1333	July 31, 2025
Projpred: Non-Convergence of Predictive Performance with Reference Model General fitting-issues , projpred	20	1114	August 18, 2023

Errors running cv_varsel , and a search terms question

Related topics