Adding elpd_kfold estimates from several models to create composite elpd_kfold for comparison

I would like to run the following hierarchical models in brms and then conduct kfold on each as follows:

f1 <- bf('y1~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f2 <- bf('y2~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f3 <- bf('y1~1+var1+var2+(1+var1+var2+|GroupID)')
f4 <- bf('y2~1+var1+var2+(1+var1+var2+|GroupID)')

fit1 <- brm(formula = f1,data=df,iter=6000,chains=4,cores=4,control=list(adapt_delta=0.9),family=gaussian(link="identity"),save_all_pars = TRUE))
fit2 <- brm(formula = f2,data=df,iter=6000,chains=4,cores=4,control=list(adapt_delta=0.9),family=gaussian(link="identity"),save_all_pars = TRUE))
fit3 <- brm(formula = f3,data=df,iter=6000,chains=4,cores=4,control=list(adapt_delta=0.9),family=gaussian(link="identity"),save_all_pars = TRUE))
fit4 <- brm(formula = f4,data=df,iter=6000,chains=4,cores=4,control=list(adapt_delta=0.9),family=gaussian(link="identity"),save_all_pars = TRUE))

kfold1 <- brms::kfold(fit1,K=10,folds="stratified",group="GroupID")
kfold2 <- brms::kfold(fit2,K=10,folds="stratified",group="GroupID")
kfold3 <- brms::kfold(fit3,K=10,folds="stratified",group="GroupID")
kfold4 <- brms::kfold(fit4,K=10,folds="stratified",group="GroupID")

Then I would like to combine the resulting elpd_kfold estimates from kfold1 and kfold2 to make a composite value, and compare this composite value with the combined elpd_kfold estimates from kfold3 and kfold4. This way I could assess whether, on the whole, the first set of models will generalize better than the second set of models.

I am choosing this method over entering two complex multivariate models into kfold with compare=TRUE for a few reasons: my dataset is large (N=4830) and running kfold on large complex multivariate models takes too much time and often exceeds my application memory; and I often get diagnostic errors from running the models together as multivariate models. It seems like running the models separately and then combining the elpd_kfold estimates may be the best solution as I am not interested in modelling residual correlations.

I am wondering whether 1) this method makes sense, and 2) how to compute a composite standard error for each set of models? Thanks, I appreciate any advice.

2 Likes

Yes.

Do you want y1 and y2 to be equally weighted? If so, then composite can be computed with the usual equations for sum of normal distributed random variables (Sum of normally distributed random variables - Wikipedia)

kfold_elpd_composite <- kfold1$estimates["elpd_kfold","Estimate"] + kfold2$estimates["elpd_kfold","Estimate"]
kfold_elpd_se_composite <- sqrt(kfold1$estimates["elpd_kfold","SE"]^2+kfold2$estimates["elpd_kfold","SE"]^2)

and corresponding equations can be used also for the results from loo_compare (which works also for kfold objects)

3 Likes

Thanks for your helpful response @avehtari! Yes I would like y1 and y2 to be equally weighted, so those equations will work for me.

Re: computing the results from loo_compare, should I concatenate the pointwise values of the individual kfold results (e.g. c(kfold1$pointwise[,"elpd_kfold"], kfold2$pointwise[,"elpd_kfold"])) in order to compute the pointwise elpd differences? Then I could use the same equation implemented by loo_compare for calculating standard error of the elpd difference, i.e.:

N <- length(vector_of_pointwise_elpd_diffs)
se_diff <- sqrt(N) * sd(vector_of_pointwise_elpd_diffs)

Or is there perhaps a simpler way?

1 Like

You can use the concatenated pointwise results as you write, or you can use the same equations I mentioned to combine elpd_diff and se_diff values given by loo_compare (combination is again sum of two normally distributed variables (or at least approximated as normally distributed)).

2 Likes

I see now, thanks for clarifying!

1 Like

@avehtari One follow up question: For interpretation of elpd_diff and SE, this article recommends that if elpd_diff > 4 then the difference is meaningful and I should compare that difference to the SE. However I can’t find much guidance on how to compare elpd_diff and SE. Is there a way to calculate a confidence interval using the SE, e.g. SE*1.96 for a 95% confidence interval? It would be great to have a more precise idea of how to interpret the SE relative to elpd_diff and how to implement correction for multiple comparisons across all our analyses.

1 Like

elpd_diff and SE provide you the parameters of normal distribution, but we recommend to report those instead of making a hypothesis test using an interval. Alternatively you can compute model weights as described in Using stacking to average Bayesian predictive distributions and drop models that have a weight 0. You should not choose a single model unless it has weight 1.

With model weights you don’t need to worry about multiple comparison as long as you keep all the models that have non-zero weight. However if you have many models it is probably better to switch to reference model based approaches (if applicable) as discussed in Piironen and Vehtari, 2017, Piironen, Paasiniemi, and Vehtari, 2020, and Pavone et al, 2020.

1 Like

I see, so is it enough to simply report the elpd_diff and SE and state which model is better without giving any additional information about the uncertainty in those values? I’m just trying to understand how to report the results of testing an model of interest against a null model using this method – usually when reporting the results of a hypothesis test I include some measure of certainty in my conclusion (e.g. p values or confidence intervals) so it feels odd to not include this, but I am new to this method so perhaps I’m misunderstanding exactly what I can conclude from evaluating elpd_diff.

You can include the probability, but don’t hide elpd_diff and SE. If probability is not 1 for one model then do model averaging instead of declaring one as a winner. And if you have more than 2 models, stacking weights are better than many pairwise elpd_diff comparisons. Thus for me reporting probability is not that useful, and instead of probabilities stacking weights are more natural describing how to weight the models in an ensemble.

2 Likes