Projpred "inherits(formula, "formula") is not TRUE"


I’m trying to replicate the body fat example (at least the model & selection part) from the Pavone 2022 paper. I have the data file from the github repo and the following code (pretty much identical to the same repo):


# get data
df <- read.table("bodyfat.txt", header = T, sep = ";")

# centre & scale all predictor vars
df[,4:19] <- scale(df[,4:19])
df <-
# n obs
n <- nrow(df)
# height & weight colnames are currently height in inches & weight in pounds
# name SI cols as height & weight
colnames(df[c("weight_kg", "height_cm")]) <- c("weight", "height")
# select predictors
pred <- c("age", "weight", "height", "neck", "chest", "abdomen", "hip",
          "thigh", "knee", "ankle", "biceps", "forearm", "wrist")
# select outcome
target <- "siri"
# create formula for model
model_formula <- paste("siri~", paste(pred, collapse = "+"))
# n predictors
p <- length(pred)
# dataframe of just outcome & predictors
df <- df[,c(target,pred)]

# set up regularised horseshoe prior
p0 <- 2 # prior guess for the number of relevant variables
tau0 <- p0/(p-p0) * 1/sqrt(n)
rhs_prior <- hs(global_scale=tau0)

# fit model
fit_model <- stan_glm(formula = model_formula, data = df,
                prior = rhs_prior,
                QR = TRUE, 
                seed = 1, 
                refresh = 0)

# perform the projection predictive variable selection
bcvvs <- cv_varsel(fit_model, method='forward', cv_method='LOO', nloo = n,
                   verbose = FALSE)

cv_varsel() errors out with

Error in get_refmodel.stanreg(object, ...) : 
  inherits(formula, "formula") is not TRUE

Any help would be appreciated.



R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.8

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

Paper referred to:

Pavone, F., Piironen, J., Bürkner, P.-C., & Vehtari, A. (2022). Using reference models in variable selection. Computational Statistics. Using reference models in variable selection | SpringerLink

Github repo file referred to: ref-approach-paper/bodyfat_notebook.R at master · fpavone/ref-approach-paper · GitHub


I don’t know which projpred version Pavone et al. (2022) were using. I took a quick glance at their repo as well as at their paper again, but I could not find the projpred version. Perhaps you could file an issue on their GitHub issue tracker, asking for that version?

Thanks @fweber144

I found the problem. In the original notebook the formula is passed as a text string but for cv_varsel() to run you need to have a formula object.

Original notebook: formula <- paste("siri~", paste(pred, collapse = "+"))
Actually works: model_formula <- formula(paste("siri~", paste(pred, collapse = "+")))