I am encountering some interesting errors and warnings when running loo::cv_varsel() on a large brms fit. To start, my brms looks like:
> URAM.brm1 <- brm(URAM ~ MOTOR.avg.sc + NOHIKER.avg.sc + # rec. disturbance
+ CR_CLOSURE.sc + PROJ_AGE_1.sc + NDVI.sc + # forest structure
+ ag_DIST.sc + # proximity to resources
+ ODHE + # prey species
+ linear_DIST.sc + # linear features
+ SLOPE.sc + # land structure
+ Height.sc + Dist_T.sc + # camera set vars
+ #season + # seasonal variation
+ (1|stat), # station as a random effect
+ family=bernoulli,
+ control=list(max_treedepth=15),
+ iter=100000,
+ chains=4,
+ warmup=50000,
+ cores=4,
+ data=dat)
…where URAM is a binomial response, and all explanatory parameters are numerical, scaled, and centered and there is one random effect (stat). Once this has finished running, it spits out the first warning:
Compiling Stan program...
Start sampling
Warning message:
In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
'C:/PROGRA~1/Git/usr/mingw_/bin/g++' not found
but it appears to have converged, and there are no tell-tale signs of fit issues from diagnostic plots.
> URAM.brm1
Family: bernoulli
Links: mu = logit
Formula: URAM ~ MOTOR.avg.sc + NOHIKER.avg.sc + CR_CLOSURE.sc + PROJ_AGE_1.sc + NDVI.sc + ag_DIST.sc + ODHE + linear_DIST.sc + SLOPE.sc + Height.sc + Dist_T.sc + (1 | stat)
Data: dat (Number of observations: 3533)
Samples: 4 chains, each with iter = 1e+05; warmup = 50000; thin = 1;
total post-warmup samples = 2e+05
Group-Level Effects:
~stat (Number of levels: 58)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.08 0.19 0.76 1.51 1.00 65113 103002
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.46 0.20 -3.88 -3.10 1.00 117911 125988
MOTOR.avg.sc -0.26 0.14 -0.58 -0.03 1.00 238673 131665
NOHIKER.avg.sc -0.07 0.11 -0.34 0.09 1.00 187252 121781
CR_CLOSURE.sc 0.03 0.19 -0.36 0.40 1.00 112977 126476
PROJ_AGE_1.sc 0.09 0.21 -0.33 0.51 1.00 114679 128449
NDVI.sc 0.17 0.09 -0.00 0.36 1.00 306417 143086
ag_DIST.sc -0.61 0.25 -1.13 -0.12 1.00 108087 116678
ODHE 0.52 0.20 0.11 0.91 1.00 268955 152079
linear_DIST.sc -0.31 0.25 -0.83 0.16 1.00 140013 130325
SLOPE.sc -0.38 0.20 -0.79 0.01 1.00 115506 133349
Height.sc 0.20 0.16 -0.12 0.51 1.00 124956 133975
Dist_T.sc -0.20 0.19 -0.58 0.15 1.00 118030 125127
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
You’ll notice that some variables (e.g. NDVI.sc and SLOPE.sc) are nearly significant from a 95% CI +/- test (no overlap with zero), so the goal is to discern whether any uninformative parameters can be removed from the model using cv_varsel() (therefore possibly rendering more informative parameters as significant). I call cv_varsel() as such:
> URAM.1_cv <- cv_varsel(URAM.brm1, cv_method='LOO', cores=4)
The model appears to work initially…
1] "Computing LOOs..."
| | 0%
but then begins to print the same warning repeatedly…
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
…this is printed maybe 50-100 times before showing…
|= | 1%
…then immediately begins printing another 50-100 instances of…
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
boundary (singular) fit: see ?isSingular
before ultimately culminating with:
Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev, :
pwrssUpdate did not converge in (maxit) iterations
In addition: Warning messages:
1: In cv_varsel.refmodel(refmodel, ...) :
K provided, but cv_method is LOO.
2: Quick-TRANSfer stage steps exceeded maximum (= 10000000)
3: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
I am unsure of whether my issue is with the initial brms fit, or with the cv_varsel call, so would greatly appreciate some input on this if anyone else has experienced similar issues. I believe perhaps the Bayesian model is “overfit” given the repetitive error of “boundary (singular) fit” usually accompanies overfit models in lme4, but I am somewhat new to brms() so unsure if that’s a similar issue in this field.