Firstly, thanks so much for all of the work on BRMS, STAN, and projpred, these are all fantastic tools!!! And apologies in advance if my question is daft, I am still learning about these tools and I have likely overlooked something really trivial!
Main Issues
I am running a hierarchical model, and trying to perform variable selection, I hope to use projpred. cv_varsel
is taking a very long time to run, and I am always running out of memory before jobs have finished (I am using my university’s HPC, and have a memory limit of 100GB). My main questions are:
- Am I asking too much of projpred? My dataset contains 30,000 observations, and I am looking at 10 potential explanatory variables.
- Is there anything I can do, for example specifying
search_terms
to help things run more efficiently. - How do I ensure that certain variables are always retained throughout the search (see below for more details).
Details
I have a dataset of 30,000 households. Each household is nested within a village. Each village nested within a country, i.e:
y ~ 1 + (1|village) + (1|country)
Keeping this hierarchical structure is key. I now want to now include some explanatory variables, say education level, amount of land owned etc. I have about 10 variables that I am considering, I was hoping to use projpred for variable selection. I have fitted a reference model in brms:
model <- brm(
formula= bf(y ~ (1|village) + (1|country) + x1 + x2 + x3... x10),
...
)
I then take this model, load it, and run projpred. At the moment, these are the only arguments I am using:
seed <- 123
ref_model <- projpred::get_refmodel(model)
cv_varsel_res <- cv_varsel(ref_model,
method = 'forward',
cv_method = 'kfold',
K = 5,
verbose = TRUE,
seed = seed)
save(cv_varsel_res,file="..."))
Firstly, I want to make sure that group effects are considered at every step of the search: 1 + (1|village) + (1|country)
. Is there any way I can do this using search_terms
.
Secondly, is there anything I might be missing that could help this run more efficiently?
Thanks so much in advance for looking at this! Please let me know if you have any follow up questions!