Projpred "inherits(formula, "formula") is not TRUE"


I’m trying to replicate the body fat example (at least the model & selection part) from the Pavone 2022 paper. I have the data file from the github repo and the following code (pretty much identical to the same repo):


# get data
df <- read.table("bodyfat.txt", header = T, sep = ";")

# centre & scale all predictor vars
df[,4:19] <- scale(df[,4:19])
df <-
# n obs
n <- nrow(df)
# height & weight colnames are currently height in inches & weight in pounds
# name SI cols as height & weight
colnames(df[c("weight_kg", "height_cm")]) <- c("weight", "height")
# select predictors
pred <- c("age", "weight", "height", "neck", "chest", "abdomen", "hip",
          "thigh", "knee", "ankle", "biceps", "forearm", "wrist")
# select outcome
target <- "siri"
# create formula for model
model_formula <- paste("siri~", paste(pred, collapse = "+"))
# n predictors
p <- length(pred)
# dataframe of just outcome & predictors
df <- df[,c(target,pred)]

# set up regularised horseshoe prior
p0 <- 2 # prior guess for the number of relevant variables
tau0 <- p0/(p-p0) * 1/sqrt(n)
rhs_prior <- hs(global_scale=tau0)

# fit model
fit_model <- stan_glm(formula = model_formula, data = df,
                prior = rhs_prior,
                QR = TRUE, 
                seed = 1, 
                refresh = 0)

# perform the projection predictive variable selection
bcvvs <- cv_varsel(fit_model, method='forward', cv_method='LOO', nloo = n,
                   verbose = FALSE)

cv_varsel() errors out with

Error in get_refmodel.stanreg(object, ...) : 
  inherits(formula, "formula") is not TRUE

Any help would be appreciated.



R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.6.8

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] projpred_2.2.2  rstanarm_2.21.3 Rcpp_1.0.9      forcats_0.5.2   stringr_1.4.1   dplyr_1.0.10    purrr_0.3.5     readr_2.1.3     tidyr_1.2.1    
[10] tibble_3.1.8    ggplot2_3.4.0   tidyverse_1.3.2

loaded via a namespace (and not attached):
  [1] googledrive_2.0.0    minqa_1.2.5          colorspace_2.0-3     ellipsis_0.3.2       markdown_1.4         base64enc_0.1-3     
  [7] fs_1.5.2             rstudioapi_0.14      rstan_2.21.7         DT_0.26              fansi_1.0.3          lubridate_1.9.0     
 [13] xml2_1.3.3           codetools_0.2-18     splines_4.1.1        shinythemes_1.2.0    bayesplot_1.10.0     jsonlite_1.8.3      
 [19] nloptr_2.0.3         broom_1.0.1          dbplyr_2.2.1         shiny_1.7.3          compiler_4.1.1       httr_1.4.4          
 [25] backports_1.4.1      assertthat_0.2.1     Matrix_1.5-1         fastmap_1.1.0        gargle_1.2.1         cli_3.4.1           
 [31] later_1.3.0          htmltools_0.5.3      prettyunits_1.1.1    tools_4.1.1          igraph_1.3.5         gtable_0.3.1        
 [37] glue_1.6.2           reshape2_1.4.4       cellranger_1.1.0     vctrs_0.5.1          nlme_3.1-160         crosstalk_1.2.0     
 [43] ps_1.7.2             lme4_1.1-31          rvest_1.0.3          timechange_0.1.1     mime_0.12            miniUI_0.1.1.1      
 [49] lifecycle_1.0.3      gtools_3.9.3         googlesheets4_1.0.1  MASS_7.3-58.1        zoo_1.8-11           scales_1.2.1        
 [55] colourpicker_1.2.0   hms_1.1.2            promises_1.2.0.1     parallel_4.1.1       inline_0.3.19        shinystan_2.6.0     
 [61] gamm4_0.2-6          gridExtra_2.3        loo_2.5.1            StanHeaders_2.21.0-7 stringi_1.7.8        dygraphs_1.1.1.6    
 [67] boot_1.3-28.1        pkgbuild_1.4.0       rlang_1.0.6          pkgconfig_2.0.3      matrixStats_0.63.0   lattice_0.20-45     
 [73] rstantools_2.2.0     htmlwidgets_1.5.4    processx_3.8.0       tidyselect_1.2.0     plyr_1.8.8           magrittr_2.0.3      
 [79] R6_2.5.1             generics_0.1.3       DBI_1.1.3            mgcv_1.8-41          pillar_1.8.1         haven_2.5.1         
 [85] withr_2.5.0          xts_0.12.2           survival_3.4-0       modelr_0.1.10        crayon_1.5.2         utf8_1.2.2          
 [91] tzdb_0.3.0           grid_4.1.1           readxl_1.4.1         callr_3.7.3          threejs_0.3.3        reprex_2.0.2        
 [97] digest_0.6.30        xtable_1.8-4         httpuv_1.6.6         RcppParallel_5.1.5   stats4_4.1.1         munsell_0.5.0       
[103] shinyjs_2.1.0      

Paper referred to:

Pavone, F., Piironen, J., Bürkner, P.-C., & Vehtari, A. (2022). Using reference models in variable selection. Computational Statistics. Using reference models in variable selection | SpringerLink

Github repo file referred to: ref-approach-paper/bodyfat_notebook.R at master · fpavone/ref-approach-paper · GitHub


I don’t know which projpred version Pavone et al. (2022) were using. I took a quick glance at their repo as well as at their paper again, but I could not find the projpred version. Perhaps you could file an issue on their GitHub issue tracker, asking for that version?

Thanks @fweber144

I found the problem. In the original notebook the formula is passed as a text string but for cv_varsel() to run you need to have a formula object.

Original notebook: formula <- paste("siri~", paste(pred, collapse = "+"))
Actually works: model_formula <- formula(paste("siri~", paste(pred, collapse = "+")))