Errors running cv_varsel , and a search terms question

Hi all,

I am a PhD student using brms to run mixed models to understand the effect of parental factors on the fertility of eggs. My model currently looks like this - understanding the effect ion maternal and paternal ages, ages as quadratic effects, inbreeding coefficients and interactions between age and inbreeding coefficient for each parent.

model_dev <- brm(Fertile ~ 
	#linear fixed effects   
                Pair.Inbreeding + 
	#quadratic fixed effects
                   I(Dam.age.years^2) + I(Sire.age.years^2) +
	#interactive fixed effects
                   Sire.Inbreeding*Sire.age.years + Dam.Inbreeding*Dam.age.years +
	#random effects
                   (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                   (1|Lay.Year) + (1|Location),
                 data = data_dev_sc,
                 family = "bernoulli",
                 prior = priors_dev,
                 iter = 30000,
                 control=list(adapt_delta=0.99),
                 cores = 8,
                 sample_prior = TRUE,
                 seed = 13579)

Based on other research and with the help of other forum members, I have now successfully set my priors are as follows:

priors_dev <- c(set_prior("normal(0,5)", class = "b", coef = "Dam.age.years"),
            set_prior("normal(-5,5)", class = "b", coef = "Sire.age.years"),
            set_prior("normal(0,10)", class = "b", coef = "Sire.Inbreeding"),
            set_prior("normal(0,10)", class = "b", coef = "Dam.Inbreeding"),
            set_prior("normal(-40,20)", class = "b", coef = "Pair.Inbreeding"),
            set_prior("normal(0,5)", class = "b", coef = "IDam.age.yearsE2"),
            set_prior("normal(0.5,5)", class = "b", coef = ā€œISire.age.yearsE2"))

After running the model, I am wanting to do variable selection using projpredā€™s cv_varsel, but have been hitting the following error:

Error in updateGlmerDevfun(devfun, glmod$reTrms, nAGQ = nAGQ) : 
  nAGQ > 1 is only available for models with a single, scalar random-effects term

I am following the methods of a paper which uses ā€˜search_termsā€™ argument of cv_varsel to ensure that the fixed effects are explored before the random effects and are included in every model. As a result my search_terms currently look like this:

search_terms<-c(
"1",
"Pair.Inbreeding",
"Dam.age.years",
"Sire.age.years",
"Dam.Inbreeding",
"Sire.Inbreeding",
"Dam.age.years+Dam.Inbreeding+Dam.age.years:Dam.Inbreeding",
"Sire.age.years+Sire.Inbreeding+Sire.age.years:Sire.Inbreeding",
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Sire.Studbook.ID)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Dam.Studbook.ID)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Lay.Year)"
),
paste0(
"Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
"Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
"(1|Location)"
)
)

Is there a better way to do this? My aim is simply to prioritise the fixed terms over the random terms when the variable selection is happening.

My cv_varsel code is as follows:

cvvs_dev <- cv_varsel( 
  ref_dev,
  validate_search = TRUE,
### prioritise fixed effects first
  search_terms = search_terms,
  refit_prj=FALSE,
  nclusters_pred = 20,
  seed = 13579
)

Please see full code below to simulate equivalent data to recreate the error.

Many thanks!

library(tidyverse)
library(brms)
library(projpred)

set.seed(13579)
predictor_mean <- 0
predictor_sd <- .5
Pair.Inbreeding <- Dam.age.years <- Sire.age.years <- Dam.Inbreeding <- Sire.Inbreeding <- rnorm(539, mean = predictor_mean, sd = predictor_sd)
Fertile <- as.factor(sample(c("Y","N"), size = 539, replace = T))
Dam.Studbook.ID <- as.factor(sample(c(1:52), size=539, replace = T))
Sire.Studbook.ID <- as.factor(sample(c(53:104), size=539, replace = T))
Lay.Year <- as.factor(sample(c(1988:2019), size = 539, replace = T))
Location <- as.factor(sample(c(1:12), size = 539, replace = T))

data_dev_sc <- tibble(
    Fertile,
    Dam.age.years,
    Sire.age.years,
    Dam.Inbreeding,
    Sire.Inbreeding,
    Pair.Inbreeding,
    Sire.Studbook.ID,
    Dam.Studbook.ID,
    Lay.Year,
    Location
)

# Set egg_dev priors
priors_dev <- c(
    set_prior("normal(0,5)", class = "b", coef = "Dam.age.years"),
    set_prior("normal(-5,5)", class = "b", coef = "Sire.age.years"),
    set_prior("normal(0,10)", class = "b", coef = "Sire.Inbreeding"),
    set_prior("normal(0,10)", class = "b", coef = "Dam.Inbreeding"),
    set_prior("normal(-40,20)", class = "b", coef = "Pair.Inbreeding"),
    set_prior("normal(0,5)", class = "b", coef = "IDam.age.yearsE2"),
    set_prior("normal(0.5,5)", class = "b", coef = "ISire.age.yearsE2")
)

model_dev <- brm(Fertile ~ 
                   Pair.Inbreeding + 
                   I(Dam.age.years^2) + I(Sire.age.years^2) +
                   Sire.Inbreeding*Sire.age.years + Dam.Inbreeding*Dam.age.years +
                   (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                   (1|Lay.Year) + (1|Location),
                 data = data_dev_sc,
                 family = "bernoulli",
                 prior = priors_dev,
                 iter = 30000,
                 control=list(adapt_delta=0.99),
                 cores = 8,
                 sample_prior = TRUE,
                 seed = 13579)
ref_dev <- brms:::get_refmodel.brmsfit(model_dev, seed = 13579)

search_terms <- c(
    "1",
    "Pair.Inbreeding",
    "Dam.age.years",
    "Sire.age.years",
    "Dam.Inbreeding",
    "Sire.Inbreeding",
    "Dam.age.years+Dam.Inbreeding+Dam.age.years:Dam.Inbreeding",
    "Sire.age.years+Sire.Inbreeding+Sire.age.years:Sire.Inbreeding",
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Sire.Studbook.ID)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Dam.Studbook.ID)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Lay.Year)"
    ),
    paste0(
        "Sire.age.years+I(Sire.age.years^2)+Dam.age.years+I(Dam.age.years^2)+Dam.age.years:Dam.Inbreeding+",
        "Dam.Inbreeding+Sire.Inbreeding+Pair.Inbreeding+Sire.age.years:Sire.Inbreeding+",
        "(1|Location)"
    )
)

cvvs_dev <- cv_varsel(
    ref_dev,
    ### Only for the sake of speed (not recommended in general):
    validate_search = TRUE,
    ### prioritise fixed effects first
    search_terms = search_terms,
    nclusters_pred = 20,
    ### nterms_max = 12,
    seed = 13579
)

The nAGQ error is from package lme4 which is used by projpred for performing the projection (namely by fitting an lme4 model to ā€œtheā€ fit of the reference model). It is caused by projpred encountering an lme4 convergence error and then trying to avoid that automatically by increasing argument nAGQ of lme4::glmer(). You could now try to play around with the lme4 tuning parameters (see lme4::glmer()) until the error disappears (without compromising the convergence of the estimation algorithm used by lme4). For that, I guess you would have to adapt projpred:::divmin() to replace the automatic nAGQ adjustment in these lines by something which works in your case. The adapted projpred:::divmin() function then needs to be passed to argument div_minimizer of init_refmodel() (via brms:::get_refmodel.brmsfit() in your case). However, instead of taking this tedious way directly, I think you should try the latent projection (Latent projection predictive feature selection ā€¢ projpred) first. The latent projection might be able to avoid the lme4 convergence error underlying the nAGQ error and in general, it is also able to avoid this lme4 issue. However, the latent projection is only an approximate solution.

Concerning your search_terms question:

Is there a better way to do this?

Unfortunately, there is no better way at the moment. For the future, we plan some enhancements (see also Fixing certain terms to be included in all submodels in cv_varsel in a high dimensional setting Ā· Issue #346 Ā· stan-dev/projpred Ā· GitHub), but we havenā€™t made it yet to implement these.

Some minor remarks:

  1. In cv_varsel(), you are using refit_prj=FALSE and nclusters_pred = 20. The nclusters_pred = 20 part is not necessary and can be omitted because you are using the default of nclusters = 20 for the search part. Apart from that, for final results, your nclusters / refit_prj / nclusters_pred setting seems quite rough. See also my answer at Projpred: Fixing Group Effects in Search Terms and Tips for Speed? - #2 by fweber144.

  2. In the brm() call, you have the predictor terms I(Dam.age.years^2) and I(Sire.age.years^2). These should now be supported correctly by projpred (recently, there have been some fixes for such terms), but in general (also outside of projpred), such special terms are always a bit more dangerous than pre-calculating the corresponding predictor data, appending it as new columns to the dataset, and then using these pre-calculated predictor columns as ā€œusualā€ predictor terms in the model formula.

1 Like

Thanks for this! Using the latent projection enabled cv_varsel to complete. However, when following the guide you kindly provided, I am hitting a new error when trying to plot the outcome of cv_varsel, as follows:

plot(cvvs_dev, stats = "mlpd", deltas = TRUE)

gives the error:

`latent_ilink` returned only `NA`s, so all performance statistics will also be `NA` as long as `resp_oscale = TRUE`.
`latent_ilink` returned only `NA`s, so all performance statistics will also be `NA` as long as `resp_oscale = TRUE`.
Error in out[["foldwise"]][, seq_len(nterms_max), drop = FALSE] : 
  subscript out of bounds

Thanks again for your continued support.

1 Like

Thank you for reporting this. Could you provide a reprex on projpredā€™s issue tracker, also mentioning which projpred version you used for creating cvvs_dev as well as for that plot() call?

Hi again,

I attempted to produce a reprex, but my generated data unfortunately did not manage to reproduce the problem. I have since simplified my models to use a column with squared values in rather than a squared term, as you suggested.

I am now facing a new error (projpred 2.8.0; lme4 1.1-35.1; Matrix downgraded to 1.6-1) when using the latent projection as you suggested:

Error in eigen(sigma, symmetric = TRUE) : 
  infinite or missing values in 'x'

It appears after the following line in the terminal and aborts when it reaches 20%.

Running the search and the performance evaluation with `refit_prj = TRUE` for each of the K = 5 CV folds separately ...

This is a new error that I have begun experiencing since updating projpred and R and then downgrading Matrix to fix the ā€œError in initializePtr() : function ā€˜cholmod_factor_ldetAā€™ not provided by package ā€˜Matrixā€™ā€ error.

Also, the following warning appears 15 times (with a few variations in the number of negative eigenvalues and the values themselves):

Warning messages:
1: Quick-TRANSfer stage steps exceeded maximum (= 3000000)
2: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 1.90098 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Cholmod warning 'not positive definite' at file '../Cholesky/t_cholmod_rowfac.c', line 430
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.0119061 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0167231 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.04211 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.538126 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0138167 (tol [... truncated]
3: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
4: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  Model failed to converge with max|grad| = 0.00528954 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00651383 (tol = 0.002, component 1)
  Cholmod warning 'not positive definite' at file '../Cholesky/t_cholmod_rowfac.c', line 430
  Model failed to converge with max|grad| = 0.0095549 (tol = 0.002, component 1)
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.417913 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.757001 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.75866 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0942 [... truncated]
5: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
6: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 0.0338797 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.104824 (tol = 0.002, component 1)
---
7: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  144 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID)
8: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 3 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
boundary (singular) fit: see help('isSingular')
  Model failed to converge with max|grad| = 0.00833148 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.146391 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
---
9: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  188 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + Sire.age.years + Dam.age.sq + Dam.age.years + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Location)
10: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  Model failed to converge with max|grad| = 0.00220589 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00282146 (tol = 0.002, component 1)
---
11: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  2 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Location)
12: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.110325 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 1.53966 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00835859 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00307305 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00254743 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00313908 (tol = 0.002, component 1)
---
13: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  20 out of 20 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)
14: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
  Model failed to converge with max|grad| = 0.00475809 (tol = 0.002, component 1)
---
15: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  89 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID)
16: In warn_prj_drawwise(mssgs_warns_capts, throw_warn = throw_warn_sdivmin) :
  The following messages and/or warnings have been thrown by the current submodel fitter (i.e., the current draw-wise divergence minimizer):
---
  convergence code 1 from optimx: none
  unable to evaluate scaled gradient
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
  Model failed to converge with max|grad| = 0.00300246 (tol = 0.002, component 1)
boundary (singular) fit: see help('isSingular')
  Model failed to converge with max|grad| = 0.0156264 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.0124273 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 1.30269 (tol = 0.002, component 1)
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
  Model failed to converge with max|grad| = 0.0031246 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00375001 (tol = 0.002, component 1)
  Model failed to converge with max|grad| = 0.00323926 (tol = 0.002, component 1)
  Model fai [... truncated]
17: In check_conv(outdmin, lengths(mssgs_warns_capts), do_check = do_check_conv) :
  800 out of 800 submodel fits (there is one submodel fit per projected draw) might not have converged (appropriately). It is recommended to inspect this in detail and (if necessary) to adjust tuning parameters via `...` (the ellipsis of the employed top-level function such as project(), varsel(), or cv_varsel()) or via a custom `div_minimizer` function (an argument of init_refmodel()). In the present case, the submodel fits are of class(es) `c("lmerMod")`. Documentation for corresponding tuning parameters is linked in section "Draw-wise divergence minimizers" of `` ?`projpred-package` ``. Current submodel formula (right-hand side): ~Sire.age.sq + Sire.age.years + Sire.Inbreeding + Pair.Inbreeding + Dam.age.years + Dam.age.sq + (1 | Sire.Studbook.ID) + (1 | Dam.Studbook.ID) + (1 | Management)

I am trying to take advantage of the new ā€œforce_search_term()ā€ function to force the model to select each fixed term, in any combination, before the random terms. Attached is the code I am using from brms to model projection.

model_dev <- brm(Fertile ~ 
                       Dam.age.years + Dam.age.sq +
                       Sire.age.years + Sire.age.sq +
                       Sire.Inbreeding + Pair.Inbreeding +
                       (1|Sire.Studbook.ID) + (1|Dam.Studbook.ID)+
                       (1|Management) + (1|Location),
                     data = data_dev_sc,
                     family = "bernoulli",
                     prior = NULL,
                     iter = 30000,
                     control=list(adapt_delta=0.99),
                     cores = 8,
                     sample_prior = TRUE,
                     seed = 13579)

ref_dev <- brms:::get_refmodel.brmsfit(model_dev, seed = 13579, latent= TRUE, latent_y_unqs  = c("0","1"))

search_terms_forced <- force_search_terms(
  forced_terms = c("Dam.age.years",
                   "Dam.age.sq",
                   "Sire.age.years",
                   "Sire.age.sq",
                   "Sire.Inbreeding",
                   "Pair.Inbreeding"),
  optional_terms = c("(1|Sire.Studbook.ID)",
                     "(1|Dam.Studbook.ID)",
                     "(1|Management)",
                     "(1|Location)")
)
search_terms_full <- c("Dam.age.years",
                       "Dam.age.sq",
                       "Sire.age.years",
                       "Sire.age.sq",
                       "Sire.Inbreeding",
                       "Pair.Inbreeding", search_terms_forced)

cvvs_dev <- cv_varsel( 
  ref_dev,
  ### prioritise fixed effects first
  search_terms = search_terms_full,
  refit_prj = TRUE,
  ndraws_pred = 800,
  # nclusters_pred = 20,
  nterms_max = 9,
  seed = 13579,
  method = "forward",
  cv_method = "kfold",
  validate_search=TRUE
)

Reducing ntermsmax as suggested does remove that particular error but the others remain.

For the sake of clarity, ā€˜search_terms_fullā€™ returns the following:

 [1] "Dam.age.years"                                                                                                       
 [2] "Dam.age.sq"                                                                                                          
 [3] "Sire.age.years"                                                                                                      
 [4] "Sire.age.sq"                                                                                                         
 [5] "Sire.Inbreeding"                                                                                                     
 [6] "Pair.Inbreeding"                                                                                                     
 [7] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding"                       
 [8] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Sire.Studbook.ID)"
 [9] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Dam.Studbook.ID)" 
[10] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Management)"      
[11] "Dam.age.years + Dam.age.sq + Sire.age.years + Sire.age.sq + Sire.Inbreeding + Pair.Inbreeding + (1|Location)"  

Does this do as I intend it to or am I making it over complicated? Does this have the potential to cause the above error?

Many thanks!

This sounds again like an error during some lme4 fit, analogously to what I explained above. The warnings (except for the Quick-TRANSfer warning) are probably related to this. So you could play around with the lme4 tuning parameters (see lme4::glmer()) and see if that resolves the issue (of course, your adjustments of the tuning parameters should not compromise the convergence of the estimation algorithm used by lme4).

The warning

Quick-TRANSfer stage steps exceeded maximum (= 3000000)

seems to be due to the K-means algorithm used internally by projpred for clustering the posterior draws (see, e.g., here). This in turn might be related to the large number of posterior draws that you have. Is it possible to reduce the number of posterior draws in your case?

In your specific case, search_terms_full should be fine as you defined it.

1 Like