Hello all,
I’ve been attempting to run some model comparison procedures using brms. The model is a weighted regression where the observations are weighted according to their precision. This is needed because I do not have access to the raw data, only grouped summaries (as an aside, if anyone can suggest an alternative modeling strategy, I would be amenable to that as well).
There are two key predictors I’m interested in, along with a set of covariates. I suspect some of the problem may come from the fact that the two key predictors are highly correlated with each other (~.8). There are also some high correlations between the covariates, and between the key predictors and a number of the covariates.
The strategy I’m currently taking compares 5 models:
m1 <- y~ covariates
m2 <- y~ covariates + predictor1
m3 <- y~ covariates + predictor2
m4 <- y~ covariates + predictor1 + predictor2
m5 <- y~ covariates + predictor1*predictor2
Using the model_weights
function from brms, I’ve tried out each of the methods for obtaining out-of-sample predictive performance (kfold, waic, both loo methods, and marglik). The default method (loo2) gives warnings suggesting to use kfold because a high number of worrisome pareto-k values, but the results from using kfold validation vary substantially across runs (effectively, it flips between giving 100% of the weight to models 3, 4, or 5). The other three methods (waic, loo, and marglik) give 100% of the weight to model 5.
My main questions are whether there’s something inherent in what I’m doing that will lead to these problems, or if I can do something to ameliorate them. And second, if there isn’t anything I can do, what should I make of these results?
Full code is pasted below, and I’ve attached the dataset for reproducibility purposes.
eg_dat.csv (871.5 KB)
mod.dat <- read.csv('eg_data.csv')
m1 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ total_pop + col_grads +
unemp_rate + med_income + poverty_rate + crime_rate + density +
mobility + white_prop + black_prop + b.w.ratio + dissim,
data=mod.dat, family=gaussian(),
prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)
m2 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_implicit + total_pop +
col_grads + unemp_rate + med_income + poverty_rate + crime_rate +
density + mobility + white_prop + black_prop + b.w.ratio + dissim,
data=mod.dat, family=gaussian(),
prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)
m3 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit + total_pop +
col_grads + unemp_rate + med_income + poverty_rate + crime_rate +
density + mobility + white_prop + black_prop + b.w.ratio + dissim,
data=mod.dat, family=gaussian(),
prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)
m4 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit + std_implicit +
total_pop + col_grads + unemp_rate + med_income + poverty_rate +
crime_rate + density + mobility + white_prop + black_prop +
b.w.ratio + dissim, data=mod.dat, family=gaussian(),
prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)
m5 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit*std_implicit +
total_pop + col_grads + unemp_rate + med_income + poverty_rate +
crime_rate + density + mobility + white_prop + black_prop +
b.w.ratio + dissim, data=mod.dat, family=gaussian(),
prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)
kf_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'kfold') #unstable
m1 m2 m3 m4 m5
1.491436e-54 5.393624e-23 1.420055e-17 1.000000e+00 5.048618e-19
kf_weights2 <- model_weights(m1, m2, m3, m4, m5, weights = 'kfold') #unstable
m1 m2 m3 m4 m5
1.563694e-90 1.092874e-55 7.275380e-23 4.529283e-02 9.547072e-01
waic_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'waic')
m1 m2 m3 m4 m5
3.022089e-74 3.263825e-23 8.106708e-06 1.872163e-04 9.998047e-01
loo_weights <- model_weights(m1, m2, m3, m4, m5, weights='loo')
m1 m2 m3 m4 m5
1.768809e-74 5.787978e-25 4.801307e-06 1.931451e-03 9.980637e-01
loo2_weights <- model_weights(m1, m2, m3, m4, m5, weights='loo2') #pareto-k warnings
m1 m2 m3 m4 m5
3.013783e-01 2.263873e-01 3.518062e-01 1.729065e-06 1.204264e-01
mlik_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'marglik')
m1 m2 m3 m4 m5
9.532557e-82 1.448537e-29 2.345684e-09 1.809704e-07 9.999998e-01
- Operating System: Mac OS 10.11.6
- brms Version: 2.4