Comparing loo for two complex multivariate models with large data

htrier · September 24, 2020, 3:53pm

This post refers to a problem I have addressed in a previous post. I have not been able to find a solution using the approach described in that post, so I would like to try a different approach. The problem:

I am running two complex multivariate models using a large dataset (N=4830). The models follow this format:

f1 = as.formula('y1~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f2 = as.formula('y2~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f3 = as.formula('y1~1+var1+var2+(1+var1+var2+|GroupID)')
f4 = as.formula('y2~1+var1+var2+(1+var1+var2+|GroupID)')
model_of_interest = brm(f1+f2,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)
null_model = brm(f3+f4,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)

I would like to use loo to compare the models. However, running brms::loo_subsample() returns pareto k diagnostic warnings and I am not able to run brms::loo_moment_match() (error description here). Looking at the output of loo_subsample for each of my models, the main issue is that p_loo > p, where p is the total number of parameters in the model. I am wondering whether it may be the case that loo_subsample does not work well with complex multivariate models?

This brings me to my idea for a different approach: I am considering running each of the formulas (f1, f2, f3, f4) as separate brms models, and computing loo for each individually. If the loo diagnostics look good for each individual model, is there a way I could combine elpd_loo for e.g. f1 and f2 to make a composite expected log pointwise predictive density? Then perhaps I could compare the two composite loo values to see which model is better. This way I could get around loo_subsample's suboptimal behaviour with complex multivariate models.

Thanks, I appreciate any advice or feedback about this approach.

paul.buerkner · September 25, 2020, 7:28am

We haven’t thought about the possibility of combining loo_subsampling and moment_matching and we will need to discuss more if/how this can be made reasonably possible at all. For now, using loo (perhaps with moment matching) looks like the safer, although slower, approach.

Combining the elpd_loos of the individual models is only sensible if no residual correlations are modeled. Based on what I see about your model, it seems residual correlations are modeled though.

htrier · September 25, 2020, 4:13pm

Thanks for your response @paul.buerkner. I think I’ve come full circle back to my original post now where I am experiencing a bug in the loo function that doesn’t seem to have a solution.

paul.buerkner · September 28, 2020, 8:34am

I am sorry about that. I have done one change in brms that may fix some problems with moment matching but not sure if this one. Can you double check when running with the latest github version of brms?

htrier · September 28, 2020, 2:34pm

Okay I have updated to the latest github versions of both loo and brms, so my R session shows I am running brms_2.13.9, loo_2.3.1.9000, and rstan_2.21.2. Unfortunately when I run:

loo1 <- brms::loo(fit1, moment_match=TRUE)

I am seeing the error:

Error in prep_call_sampler(object) :
the compiled object from C++ code for this model is invalid, possible reasons:

compiled with save_dso=FALSE;

compiled on a different platform;

does not exist (created from reading csv files).

When I inspect fit1 I can see that it appears to be in the correct brmsfit format, so I’m not sure what aspect of the model code is invalid. I would really appreciate any advice – thank you so much for your time on this issue!

paul.buerkner · September 29, 2020, 6:00am

I am sorry, this is still creating problems. Now, this is an issue in rstan or in the interaction with brms and rstan. Definitely not something you do wrong.

Would it be possible for you to provide a minimal reproducible example for me to play around with it myself?

htrier · September 29, 2020, 11:48am

Thanks, I’ve made a reproducible example that replicates the error from this post (Moment matching failed), although I haven’t been able to replicate the invalid model error. But this is essentially what I’m trying to do:

# R version 4.0.2
library(loo)#version 2.3.1.9000
library(brms)#version 2.13.9
library(dplyr)

df <- mtcars
df <- mutate(df,ID=c(1:nrow(df)))
f1 = brmsformula('mpg~1+wt+cyl+hp+(1+wt+cyl+hp|ID)')
f2 = brmsformula('qsec~1+wt+cyl+hp+(1+wt+cyl+hp|ID)')
f3 = brmsformula('mpg~1+wt+cyl+(1+wt+cyl|ID)')
f4 = brmsformula('qsec~1+wt+cyl+(1+wt+cyl|ID)')
model_of_interest = brm(f1+f2,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)
null_model = brm(f3+f4,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)

loo1 <- brms::loo(model_of_interest, moment_match=TRUE)
loo2 <- brms::loo(null_model, moment_match=TRUE)

paul.buerkner · September 29, 2020, 1:32pm

Thanks! I fixed at least 2 issues with loo_moment_match that were brms related. Depending on the model you may still need to install loo from github as well to avoid at least one further error. Does it work now with the latest brms github version?

htrier · September 30, 2020, 9:43am

Thanks the function is working now! I really appreciate the assistance.

avehtari · October 1, 2020, 4:58pm

@htrier, if you at some point have results to share from the comparison and how subsample and moment matching helped, please ping me as it would be really cool to know more about a successful use of these methods in real life!

Topic		Replies	Views
Comparing complex multivariate models with loo_compare brms loo	13	1764	September 30, 2020
Difficulty comparing models with loo brms fitting-issues , loo	24	4954	October 30, 2020
Brms::loo(moment_match = T) does not appear to work General loo , brms	13	2370	February 6, 2023
Loo moment match crashes on brms fits after combine_models brms loo	5	256	February 27, 2024
Problems with model comparison via loo and kfold Modeling loo , brms	10	947	February 8, 2021

Comparing loo for two complex multivariate models with large data

Related topics