CV Varsel Error: Infinite or missing values in 'x'

Hi again!

So… apologies for yet another query, and thanks for all the help you’ve provided before. I am running a hierarchical model with 30,000 observations. I am using projpred for variable selection, more details on the model I am running can be found in this thread

I was wondering whether anyone would be able to interpret what might be going on with this error (apologies again for my lack of understanding).

*** EDIT *** I tried to upload my reference model (a .RDA file), but was unable to on Discourse. If anyone would like access I am happy to find another means of sending. I am running projpred version 2.6.0, brms version 2.19.0, R version 4.1***

The desired updates require recompiling the model
Start sampling
-----
-----
Running the search and the performance evaluation for each of the K = 5 CV folds separately ...
  |                                                                      |   0%Error in eigen(sigma, symmetric = TRUE) :
  infinite or missing values in 'x'
Calls: cv_varsel ... repair_re -> repair_re.merMod -> t -> <Anonymous> -> eigen
In addition: Warning messages:
1: Quick-TRANSfer stage steps exceeded maximum (= 800000)
2: Quick-TRANSfer stage steps exceeded maximum (= 800000)

This happens after the reference models are refit on each of the 5 folds. Please let me know if you need any more information/context!

Thanks again,

Léo

Hi Léo,

Sorry to hear that you are having more issues with projpred. The issue you mentioned might be related to the large dataset, but I’m not sure and would have to try out. Since you said you were unable to upload the reference model object as an RDA file, could you send it to me via a personal message?

2 Likes

Thanks again for getting in touch! I’ve sent you a personal message!

Thanks for sending. I am able to load the reference model fit into my R session, but already calling get_refmodel() crashes my R session due to insufficient RAM. Are you calling get_refmodel() also on the HPC cluster you mentioned in Projpred: Fixing Group Effects in Search Terms and Tips for Speed??

1 Like

If yes, could you perhaps create a more lightweight reference model fit? I noticed this one has 16 000 posterior draws.

1 Like

Yes I am running this on HPC cluster also. I will rerun and send through a more lightweight model. Apologies!

1 Like

Rerun done and I’ve sent that through!

1 Like

For those stumbling across this: With the reprex from GitHub - l-gorman/projpred_issue_reprex: A reproducible example for an issue being encountered in the ProjPred Package (which features a smaller dataset, a smaller reference model, and 2-fold CV), we received the error

Error in if (any(edgevals <- 0 < bdiff & bdiff < boundary.tol)) { :
  missing value where TRUE/FALSE needed

It turned out that this is due to #323 (for which I will add a warning in projpred) and can be avoided by setting nterms_max = 16 (in this reference model, there are 17 predictor terms when not counting the intercept and when counting the two group-level terms as one (because their inclusion is forced in a common fashion by combining them via +), so the full model has 17 predictor terms and we then want to cut off the search at 17 - 1 = 16 terms).

Furthermore, issue #346 is probably one of the reasons why the computations take so long in this case (in projpred:::search_forward(), it is already the creation of the list of candidate models—for a given submodel size—which takes very long).

I couldn’t observe Error in eigen(sigma, symmetric = TRUE) : infinite or missing values in 'x' with the reprex from GitHub - l-gorman/projpred_issue_reprex: A reproducible example for an issue being encountered in the ProjPred Package (when using nterms_max = 16), but feel free to post here if it still occurs.

Thanks again for all of the help @fweber144!

Ah I see, that makes sense. I will keep an eye out to see if there are any fixes on this in the future :)

I also could not reproduce this error in the smaller example. I am rerunning the model with nterms_max on the larger dataset, and I will let you know if I encounter this again (and if so will try to make a reprex that captures it).

Thanks again!

1 Like

Hi @fweber144!

A bit of progress, but apologies to again be the bearer of more queries!

So, now all models get past through the search, i.e.:

Running the search and the performance evaluation for each of the K = 5 CV folds separately ...
  |======================================================================| 100%
-----

Except, immediately after this, I get the following Error:

Error in simplify2array(lapply(res_cv, "[[", "summaries_sub"), higher = FALSE,  :
  unused argument (except = NULL)
Calls: cv_varsel -> cv_varsel.refmodel -> kfold_varsel

The error seems to be occuring here. This error only occurs when running the reprex on HPC. I have done a bit of digging. The HPC has R version 4.1.0 (can’t upgrade to 4.2 unfortunately) and my laptop has R version 4.2.0.

For the simplify2array, the except argument was only added on for R version 4.2.0. I am happy to create my own fork, remove the except argument, and see how that works. I was just wondering whether you know what the implications might be?

Thanks again,

Léo

Hi Léo,

Thank you for reporting this! I will respond in Incompatability with R version <4.2.0 · Issue #423 · stan-dev/projpred.

Best,
Frank

projpred’s GitHub issue #346 has now been fixed by the addition of a helper function called force_search_terms().

1 Like