While fitting some of the growth curve models (see Fairbrother 2014) for one of the chapters in my dissertation I seem to have run into an issue of the models being too large (not sure if this is due to the fact I have 350,000 observations across 65 countries and 249 surveys or because I have 7000 post-warmup draws per chain) to perform approximate leave one out validation in memory. The models are fit via brms
as follows
# Model 3: Linear Trend and Varying Time Slope with Respondent Level Predictors
hlogit_3 <- bf(
support ~ female_wi + educ + age_cat + soc_pctfemleg_wi + soc_polyarchy_wi +
time_gmc + (time_gmc | country) + (1 | cntry_prj_year),
family = bernoulli(link = "logit"),
decomp = "QR"
)
# Specify Priors for model 3
hlogit_3_priors <-
prior(student_t(3, 0, 2.5), class = "Intercept") +
prior(normal(0, 3), class = "b") +
prior(exponential(0.5), class = "sd") +
prior(lkj(3), class = "cor")
# Fit the Model using Within-Chain Threading (6 chains, 10k iterations)
fit_hlogit_3 <- brm(
formula = hlogit_3,
prior = hlogit_3_priors,
data = model_df,
cores = 12,
chains = 6,
iter = 10000,
warmup = 3000,
threads = threading(threads = 2, grainsize = 100),
save_pars = save_pars(all = TRUE),
seed = 123,
backend = "cmdstanr",
save_model = "analyses/models/Electoral Democracies/Stan/HLogit_3.stan",
file = "analyses/models/Electoral Democracies/HLogit_3"
)
# Add LOO and Bayes R2 to the Model
fit_hlogit_3 <- add_criterion(
fit_hlogit_3,
model_name = "Societal Growth Curve Model 3",
criterion = c("loo", "bayes_R2"),
cores = 1,
file = "analyses/models/Electoral Democracies/HLogit_3",
)
The model converges without any issues but when I try to add LOO and Bayes R^{2} I immediately get the error
Error: cannot allocate vector of size 109.3 Gb
I tried leave-one-out subsampling parallelized across 12 cores (CPU is a Ryzen 9 5900X)
# Add LOO to the Model
fit_hlogit_3 <- add_criterion(
fit_hlogit_3,
model_name = "Societal Growth Curve Model 3",
criterion = c("loo_subsample"),
observations = 100000,
cores = 12,
file = "analyses/models/Electoral Democracies/HLogit_3",
)
but for some reason it takes an excruciatingly long time regardless of what the observations
argument is set to and I have no idea why (as of posting this it has been running for longer than it took to fit the model). Is there some reason it takes so long and are there any workarounds that would make this more computationally tractable? Any suggestions are greatly appreciated because I’m hoping to avoid having to upgrade my memory until the new DDR5 standard is released.
Tagging @avehtari, @andrewgelman, and @jonah on this one because loo