Projpred::cv_varsel() returning "Not enough (non-NA) data to do anything meaningful" error

Hi All,

I am experiencing the following issue with projection predictive variable selection, and I would much appreciate advice on what is going on and how to solve it, if possible.

I have a model, for instance:
model ← brm(y ~ fac1 + fac2 + x1 + x2 + s(x3) + (1|group), # x1, x2, x3 are continuous variables
data = data,
prior = priors,
cores = 4, seed = 123,
backend = “cmdstanr”, # (but the same issue occurs also when using “rstan”)
control = list(adapt_delta = 0.95, max_treedepth = 15))

Priors are defined as follows, for instance (but I assume this is not important to my question/problem):
priors = c(
prior(normal(0, 1), class = ‘b’),
prior(student_t(10, 1, 1), class = ‘sigma’),
prior(student_t(10, 1, 1), class = ‘sd’))

prior_summary(model)
prior class coef group resp dpar nlpar bound source
normal(0, 1) b user
normal(0, 1) b fac11 (vectorized)
normal(0, 1) b fac21 (vectorized)
normal(0, 1) b sx3_1 (vectorized)
normal(0, 1) b x1 (vectorized)
normal(0, 1) b x2 (vectorized)
student_t(3, 3, 2.5) Intercept default
student_t(10, 1, 1) sd user
student_t(10, 1, 1) sd group (vectorized)
student_t(10, 1, 1) sd Intercept group (vectorized)
student_t(3, 0, 2.5) sds default
student_t(3, 0, 2.5) sds s(x3) (vectorized)
student_t(10, 1, 1) sigma user

The model runs okay, and its diagnostics look fine as well. So I define the model as the reference model (for further variable selection):
ref_model ← get_refmodel(model)

I try to perform the projection predictive variable selection using the cv_varsel() function, that is:
variable_selection ← cv_varsel(ref_model)

Unfortunately, after a few seconds, the process stops, and I get the following message:

variable_selection ← cv_varsel(ref_model)
[1] “Computing LOOs…”
| | 0%
Error in model.matrix.gamm4(delete.response(terms(formula)), random = random, :
Not enough (non-NA) data to do anything meaningful
In addition: Warning messages:
1: In cv_varsel.refmodel(ref_model) : K provided, but cv_method is LOO.

Please, can someone clarify the error message? Is there any way to make projpred::cv_varsel() work with linear/generalised linear/generalised additive (mixed-effects) models that include both mgcv::s spline term(s) and random effect(s)? I believe the projpred package already supports LMMs/GLMMs/GAMMs - is that correct?

I use R version 4.1.0 on macOS Catalina 10.15.7. Package versions (possibly relevant to my enquiry): cmdstan (2.27.0); cmdstanr (0.4.0); brms (2.16.1); loo (2.4.1); projpred (2.0.2); rstan (2.21.2); rstanarm (2.21.2); StanHeaders (2.21.0-7); mgcv (1.8-36); gamm4 (0.2-6).

Many thanks for your advice!

Best wishes,

Tom

@AlejandroCatalina

Sorry for the delay.

Yes, see the vignette Gaussian example

You could try installing the latest version from github

devtools::install_github('stan-dev/projpred', build_vignettes = TRUE)

Can you also verify you get the error also without facs y ~ x1 + x2 + s(x3) + (1|group) ?

@AlejandroCatalina who has been the main developer for the new features has a new job and thus our response speed has been very slow recently.

Hi All,

@asael_am and @avehtari, thank you very much for your response and link to the Gaussian example.

I have tried installing the latest development version from GitHub as well as rerunning the model without the categorical variables (both before and after installing the latest projpred version). Unfortunately, this did not help with the issue. The projection predictive variable selection process still stops after a few seconds, and I get the (almost*) same error message, i.e.:

variable_selection ← cv_varsel(ref_model)
[1] “Computing LOOs…”
|| 0%
Error in model.matrix.gamm4(delete.response(terms(formula)), random = random, :
Not enough (non-NA) data to do anything meaningful

*Please note that after installing the latest projpred version, the additional warning message “In addition: Warning messages: 1: In cv_varsel.refmodel(ref_model) : K provided, but cv_method is LOO.” disappeared.

I have also tried to fit several different models using the brms::brm() function, different data and distribution of the outcome variable (mainly, family = bernoulli(link = “logit”)) and subsequently perform the projection predictive variable selection, but with no success - I have been getting the same error warning as stated above whenever the model included both an s() spline term(s) and a random effect(s) at the same time. Please note that I have been trying this only with random intercept(s), avoiding more complex random effect structures.

In contrast with the above, cv_varsel() seemed to work well when a model included either an s() spline term or a random effect, but not the two simultaneously.

Please, is there anything else I could try to do to perform cv_varsel() on a brms::brm model (with a binary outcome variable) that includes both an s() spline term(s) and a random effect(s) at once?

Many thanks for any advice.

@AlejandroCatalina - congratulations on the new job!

Best wishes,

Tom

If these work separately, then there might be a bug.

If you have time, you could investigate the problem a bit more, by running your example with turning on debug on error (if using RStudo, see menu “Debug”).

Alternatively, make an issue in projpred git repo, and add a minimal reproducible example which gives the error and we’ll have a look.