So… apologies for yet another query, and thanks for all the help you’ve provided before. I am running a hierarchical model with 30,000 observations. I am using projpred for variable selection, more details on the model I am running can be found in this thread
I was wondering whether anyone would be able to interpret what might be going on with this error (apologies again for my lack of understanding).
*** EDIT *** I tried to upload my reference model (a .RDA file), but was unable to on Discourse. If anyone would like access I am happy to find another means of sending. I am running projpred version 2.6.0, brms version 2.19.0, R version 4.1***
The desired updates require recompiling the model
Start sampling
-----
-----
Running the search and the performance evaluation for each of the K = 5 CV folds separately ...
| | 0%Error in eigen(sigma, symmetric = TRUE) :
infinite or missing values in 'x'
Calls: cv_varsel ... repair_re -> repair_re.merMod -> t -> <Anonymous> -> eigen
In addition: Warning messages:
1: Quick-TRANSfer stage steps exceeded maximum (= 800000)
2: Quick-TRANSfer stage steps exceeded maximum (= 800000)
This happens after the reference models are refit on each of the 5 folds. Please let me know if you need any more information/context!
Sorry to hear that you are having more issues with projpred. The issue you mentioned might be related to the large dataset, but I’m not sure and would have to try out. Since you said you were unable to upload the reference model object as an RDA file, could you send it to me via a personal message?
Thanks for sending. I am able to load the reference model fit into my R session, but already calling get_refmodel() crashes my R session due to insufficient RAM. Are you calling get_refmodel() also on the HPC cluster you mentioned in Projpred: Fixing Group Effects in Search Terms and Tips for Speed??
Error in if (any(edgevals <- 0 < bdiff & bdiff < boundary.tol)) { :
missing value where TRUE/FALSE needed
It turned out that this is due to #323 (for which I will add a warning in projpred) and can be avoided by setting nterms_max = 16 (in this reference model, there are 17 predictor terms when not counting the intercept and when counting the two group-level terms as one (because their inclusion is forced in a common fashion by combining them via +), so the full model has 17 predictor terms and we then want to cut off the search at 17 - 1 = 16 terms).
Furthermore, issue #346 is probably one of the reasons why the computations take so long in this case (in projpred:::search_forward(), it is already the creation of the list of candidate models—for a given submodel size—which takes very long).
Ah I see, that makes sense. I will keep an eye out to see if there are any fixes on this in the future :)
I also could not reproduce this error in the smaller example. I am rerunning the model with nterms_max on the larger dataset, and I will let you know if I encounter this again (and if so will try to make a reprex that captures it).
A bit of progress, but apologies to again be the bearer of more queries!
So, now all models get past through the search, i.e.:
Running the search and the performance evaluation for each of the K = 5 CV folds separately ...
|======================================================================| 100%
-----
Except, immediately after this, I get the following Error:
The error seems to be occuring here. This error only occurs when running the reprex on HPC. I have done a bit of digging. The HPC has R version 4.1.0 (can’t upgrade to 4.2 unfortunately) and my laptop has R version 4.2.0.
For the simplify2array, the except argument was only added on for R version 4.2.0. I am happy to create my own fork, remove the except argument, and see how that works. I was just wondering whether you know what the implications might be?