Error message from projpred with binomial regression model fitted in rstanarm

I’m running into an error message that I don’t understand and can’t find an answer for online when running a binomial regression model using rstanarm and then trying to use some functions from the R package projpred. I can’t share the data, but my best guess is that there’s a minor issue with the data I’m using that’s causing some issues. I’m following the approach detailed here:

The model runs fine:

n <- 227 
D <- 27
p0 <- 5 # prior guess for the number of relevant variables
tau0 <- p0/(D-p0) * 1/sqrt(n) 
prior_coeff <- hs(global_scale = tau0, slab_scale = 1) # regularized horseshoe prior
binregcov <- stan_glm(formula = cbind(PresenceTotal, NumberOfSamples - PresenceTotal)~., family = binomial(link="logit"), prior = prior_coeff, 
                      data = NTMgrouped%>%dplyr::select(-landcover3))

But then once I try:

cvs <- cv_varsel(binregcov, method='forward', cv_method='kfold', K=5)

I get the following error:

STATS is longer than the extent of 'dim(x)[MARGIN]'Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent

There are only 227 observations and 27 covariate values, some categorical variables, etc. I can run the examples provided here, but it’s not quite working out with my data. Perhaps it’s not the best data set for this, but I wanted to try it out anyway.

Has anyone come across this error message before?

I’ll add that I’ve run the model using brms and used the kfold function in loo, and it seems to work fine. Hopefully what I’ve been doing so far has been correct. I haven’t used the Stan affiliated R packages too often.

binregcov <- brm(data = NTMgrouped%>%dplyr::select(-landcover3), formula = PresenceTotal| trials(NumberOfSamples)~ ., family = "binomial", prior = prior(normal(0, 1), class = b), control = list(max_treedepth = 15))
kfold(binregcov, K=5)

It sounds like a bug in projpred.

1 Like

Ah, okay!

Figured it out (hopefully). In my data argument in stan_glm(), I specify

data = NTMgrouped%>%dplyr::select(-landcover3)

but that data frame has NAs for some of the covariate values, which rstanarm removes (so a few rows of the data set are removed automatically). However, it seems to cause the issue when trying to use projpred. At least it’s working now that I’ve removed the rows with NAs.

1 Like