Interpreting varying and unstable model comparison results

Hello all,

I’ve been attempting to run some model comparison procedures using brms. The model is a weighted regression where the observations are weighted according to their precision. This is needed because I do not have access to the raw data, only grouped summaries (as an aside, if anyone can suggest an alternative modeling strategy, I would be amenable to that as well).

There are two key predictors I’m interested in, along with a set of covariates. I suspect some of the problem may come from the fact that the two key predictors are highly correlated with each other (~.8). There are also some high correlations between the covariates, and between the key predictors and a number of the covariates.

The strategy I’m currently taking compares 5 models:

m1 <- y~ covariates
m2 <- y~ covariates + predictor1
m3 <- y~ covariates + predictor2
m4 <- y~ covariates + predictor1 + predictor2
m5 <- y~ covariates + predictor1*predictor2

Using the model_weights function from brms, I’ve tried out each of the methods for obtaining out-of-sample predictive performance (kfold, waic, both loo methods, and marglik). The default method (loo2) gives warnings suggesting to use kfold because a high number of worrisome pareto-k values, but the results from using kfold validation vary substantially across runs (effectively, it flips between giving 100% of the weight to models 3, 4, or 5). The other three methods (waic, loo, and marglik) give 100% of the weight to model 5.

My main questions are whether there’s something inherent in what I’m doing that will lead to these problems, or if I can do something to ameliorate them. And second, if there isn’t anything I can do, what should I make of these results?

Full code is pasted below, and I’ve attached the dataset for reproducibility purposes.

eg_dat.csv (871.5 KB)

mod.dat <- read.csv('eg_data.csv')

m1 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ total_pop + col_grads + 
            unemp_rate + med_income + poverty_rate + crime_rate + density + 
            mobility + white_prop + black_prop + b.w.ratio + dissim, 
          data=mod.dat, family=gaussian(), 
          prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)

m2 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_implicit + total_pop + 
            col_grads + unemp_rate + med_income + poverty_rate + crime_rate + 
            density + mobility + white_prop + black_prop + b.w.ratio + dissim, 
          data=mod.dat, family=gaussian(), 
          prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)

m3 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit + total_pop + 
            col_grads + unemp_rate + med_income + poverty_rate + crime_rate + 
            density + mobility + white_prop + black_prop + b.w.ratio + dissim, 
          data=mod.dat, family=gaussian(), 
          prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)

m4 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit + std_implicit + 
            total_pop + col_grads + unemp_rate + med_income + poverty_rate + 
            crime_rate + density + mobility + white_prop + black_prop + 
            b.w.ratio + dissim, data=mod.dat, family=gaussian(), 
          prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)

m5 <- brm(mn_avg_ol|weights(1/mn_avg_ol_se) ~ std_explicit*std_implicit + 
            total_pop + col_grads + unemp_rate + med_income + poverty_rate + 
            crime_rate + density + mobility + white_prop + black_prop + 
            b.w.ratio + dissim, data=mod.dat, family=gaussian(), 
          prior=c(set_prior('normal(0,5)', class='b')), save_all_pars = T)

kf_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'kfold') #unstable
 m1           m2           m3           m4           m5 

1.491436e-54 5.393624e-23 1.420055e-17 1.000000e+00 5.048618e-19

kf_weights2 <- model_weights(m1, m2, m3, m4, m5, weights = 'kfold') #unstable
  m1           m2           m3           m4           m5 

1.563694e-90 1.092874e-55 7.275380e-23 4.529283e-02 9.547072e-01

waic_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'waic') 
  m1           m2           m3           m4           m5 

3.022089e-74 3.263825e-23 8.106708e-06 1.872163e-04 9.998047e-01

loo_weights <- model_weights(m1, m2, m3, m4, m5, weights='loo')
  m1           m2           m3           m4           m5 

1.768809e-74 5.787978e-25 4.801307e-06 1.931451e-03 9.980637e-01

loo2_weights <- model_weights(m1, m2, m3, m4, m5, weights='loo2') #pareto-k warnings
  m1           m2           m3           m4           m5 

3.013783e-01 2.263873e-01 3.518062e-01 1.729065e-06 1.204264e-01

mlik_weights <- model_weights(m1, m2, m3, m4, m5, weights = 'marglik')
  m1           m2           m3           m4           m5 

9.532557e-82 1.448537e-29 2.345684e-09 1.809704e-07 9.999998e-01

  • Operating System: Mac OS 10.11.6
  • brms Version: 2.4

Do you mean the problem of kfold not returning the same results across runs?

This is only natural, since in each we partition the data randomly into 10 parts and depending on the partitioning (and your data and model) results may differ by some amount.

I recommend to do some posterior predictive checking (ppc), It is likely that your observation model is not good, and ppc would probably give you additional information. See, e.g. Graphical posterior predictive checks using the bayesplot package

The large pareto-k-values and high variability of k-fold-cv indicate that the data distribution has much longer tails (ie has outliers) than Gaussian that I assume you are using now.