I’m currently working on a project in which the outcome vector is a proportion which I’m modeling as a bounded count via a multilevel beta binomial model. I managed to get everything working by following the brms
custom model family vignette and specifying the following functions:
# Define the Stan Functions for the custom model family----
stan_funs <- "
real beta_binomial_custom_lpmf(int y, real mu, real phi, int trials) {
return beta_binomial_lpmf(y | trials, mu * phi, (1 - mu) * phi);
}
int beta_binomial_custom_rng(real mu, real phi, int trials) {
return beta_binomial_rng(trials, mu * phi, (1 - mu) * phi);
}
"
# Add the Stan Code to the Model Block----
stan_vars <- stanvar(
scode = stan_funs,
block = "functions"
)
# Define a function to calculate the log likelihood----
log_lik_beta_binomial_custom <- function(i, prep) {
mu <- brms::get_dpar(prep, "mu", i = i)
phi <- brms::get_dpar(prep, "phi", i = i)
trials <- prep$data$trials[i]
y <- prep$data$Y[i]
beta_binomial_custom_lpmf(y, mu, phi, trials)
}
# Define a function to calculate predictions----
posterior_predict_beta_binomial_custom <- function(i, prep, ...) {
mu <- brms::get_dpar(prep, "mu", i = i)
phi <- brms::get_dpar(prep, "phi", i = i)
trials <- prep$data$trials[i]
beta_binomial_custom_rng(mu, phi, trials)
}
# Define a function to calculate expectations----
posterior_epred_beta_binomial_custom <- function(prep) {
mu <- brms::get_dpar(prep, "mu")
trials <- prep$data$trials
trials <- matrix(trials, nrow = nrow(mu), ncol = ncol(mu), byrow = TRUE)
mu * trials
}
# Defining the Custom Model Family----
beta_binomial_custom <- custom_family(
"beta_binomial_custom",
dpars = c("mu", "phi"),
links = c("logit", "log"),
lb = c(NA, 0),
type = "int",
vars = "trials[n]"
)
I then specify the following model which converges in about an hour without any clear issues (though loo identifies seven problematic observations) other than a pretty high degree of autocorrelation in the chains (thinning seems to help with this).
# Unconditional Quadratic Growth Curve
betabinom_null_qgc <- bf(
femlegs | trials(leg_seats) ~ year_z + I(year_z^2) + (1 + year_z + I(year_z^2) | cntry),
center = F,
family = beta_binomial_custom
)
# Specify Priors for the model
betabinom_null_qgc_priors <-
prior(student_t(4, 0, 3), class = "b", coef = "Intercept") +
prior(normal(0, 2), class = "b") +
prior(exponential(0.8), class = "sd") +
prior(exponential(1), class = "phi")+
prior(lkj(4), class = "cor")
# Fit the model
betabinom_null_qgc_fit <- brm(
formula = betabinom_null_qgc,
prior = betabinom_null_qgc_priors,
data = femleg_df,
cores = 6,
chains = 6,
iter = 10000,
warmup = 4000, # 2500 Warmup/5000 iter would probably be fine too
refresh = 10,
thin = 2,
save_pars = save_pars(all = TRUE),
stanvars = stan_vars,
control = list(max_treedepth = 11),
seed = 123,
backend = "cmdstanr",
file = str_c(base_dir, "BetaBinom_Null_QGC_FemLeg"),
save_model = str_c(base_dir, "stan/BetaBinom_Null_QGC_FemLeg.stan")
)
# Expose Stan Functions from the Model
expose_functions(betabinom_null_qgc_fit, vectorize = T)
The problem I’ve encountered and have been unable to find a solution for is that attempting to pass the fitted model object to any of the functions from the loo package such as the following
# Add model fit criteria to the saved object----
add_criterion(
betabinom_null_qgc_fit ,
criterion = c("loo", "loo_R2"),
cores = 6,
reloo = TRUE,
moment_match = TRUE,
save_psis = TRUE,
file = str_c(base_dir, "BetaBinom_Null_QGC_FemLeg_Validated")
)
#Error in checkForRemoteErrors(val) :
# 6 nodes produced errors; first error: could not find function "beta_binomial_custom_lpmf"
k-fold cross validation also fails but with a different error and after running for three hours
add_criterion(
betabinom_null_qgc_fit,
criterion = "kfold",
folds = "grouped",
group = "cntry",
K = 5,
save_fits = TRUE,
cores = 6,
file = str_c(base_dir, "BetaBinom_Null_QGC_FemLeg_Validated")
)
#The desired updates require recompiling the model
#Start sampling
#Error in get(out, family$env) :
# object 'log_lik_beta_binomial_custom' not found
Running them with cores = 1
works without any issues, but it also takes almost 12 hours to complete and that’s just not feasible considering this is a fairly minimal specification of the full model. I’m not sure if this is an issue with brms
or loo
but any help is appreciated and will save me a lot of time.
As far as system specifications are concerned, I’m running the model on a Windows 10 desktop with an 8 core i9 9900k and 64GB of memory under R version 4.1.0 with the latest development versions of brms
(2.16.3) and loo
(2.4.1.9000) and using cmdstan
2.28.1 as a backend.
Edit: forgot to include the version of rstan
used for moment matching/kfold/reloo. That’s 2.26.3
Data for the model: femleg_subset.csv (177.3 KB)