Firstly, thanks so much for all of the work on BRMS, STAN, and projpred, these are all fantastic tools!!! And apologies in advance if my question is daft, I am still learning about these tools and I have likely overlooked something really trivial!
I am running a hierarchical model, and trying to perform variable selection, I hope to use projpred.
cv_varsel is taking a very long time to run, and I am always running out of memory before jobs have finished (I am using my university’s HPC, and have a memory limit of 100GB). My main questions are:
- Am I asking too much of projpred? My dataset contains 30,000 observations, and I am looking at 10 potential explanatory variables.
- Is there anything I can do, for example specifying
search_termsto help things run more efficiently.
- How do I ensure that certain variables are always retained throughout the search (see below for more details).
I have a dataset of 30,000 households. Each household is nested within a village. Each village nested within a country, i.e:
y ~ 1 + (1|village) + (1|country)
Keeping this hierarchical structure is key. I now want to now include some explanatory variables, say education level, amount of land owned etc. I have about 10 variables that I am considering, I was hoping to use projpred for variable selection. I have fitted a reference model in brms:
model <- brm( formula= bf(y ~ (1|village) + (1|country) + x1 + x2 + x3... x10), ... )
I then take this model, load it, and run projpred. At the moment, these are the only arguments I am using:
seed <- 123 ref_model <- projpred::get_refmodel(model) cv_varsel_res <- cv_varsel(ref_model, method = 'forward', cv_method = 'kfold', K = 5, verbose = TRUE, seed = seed) save(cv_varsel_res,file="..."))
Firstly, I want to make sure that group effects are considered at every step of the search:
1 + (1|village) + (1|country). Is there any way I can do this using
Secondly, is there anything I might be missing that could help this run more efficiently?
Thanks so much in advance for looking at this! Please let me know if you have any follow up questions!