Comparing loo for two complex multivariate models with large data

This post refers to a problem I have addressed in a previous post. I have not been able to find a solution using the approach described in that post, so I would like to try a different approach. The problem:

I am running two complex multivariate models using a large dataset (N=4830). The models follow this format:

f1 = as.formula('y1~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f2 = as.formula('y2~1+var1+var2+var3+(1+var1+var2+var3|GroupID)')
f3 = as.formula('y1~1+var1+var2+(1+var1+var2+|GroupID)')
f4 = as.formula('y2~1+var1+var2+(1+var1+var2+|GroupID)')
model_of_interest = brm(f1+f2,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)
null_model = brm(f3+f4,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)

I would like to use loo to compare the models. However, running brms::loo_subsample() returns pareto k diagnostic warnings and I am not able to run brms::loo_moment_match() (error description here). Looking at the output of loo_subsample for each of my models, the main issue is that p_loo > p, where p is the total number of parameters in the model. I am wondering whether it may be the case that loo_subsample does not work well with complex multivariate models?

This brings me to my idea for a different approach: I am considering running each of the formulas (f1, f2, f3, f4) as separate brms models, and computing loo for each individually. If the loo diagnostics look good for each individual model, is there a way I could combine elpd_loo for e.g. f1 and f2 to make a composite expected log pointwise predictive density? Then perhaps I could compare the two composite loo values to see which model is better. This way I could get around loo_subsample's suboptimal behaviour with complex multivariate models.

Thanks, I appreciate any advice or feedback about this approach.

We haven’t thought about the possibility of combining loo_subsampling and moment_matching and we will need to discuss more if/how this can be made reasonably possible at all. For now, using loo (perhaps with moment matching) looks like the safer, although slower, approach.

Combining the elpd_loos of the individual models is only sensible if no residual correlations are modeled. Based on what I see about your model, it seems residual correlations are modeled though.

Thanks for your response @paul.buerkner. I think I’ve come full circle back to my original post now where I am experiencing a bug in the loo function that doesn’t seem to have a solution.

I am sorry about that. I have done one change in brms that may fix some problems with moment matching but not sure if this one. Can you double check when running with the latest github version of brms?

Okay I have updated to the latest github versions of both loo and brms, so my R session shows I am running brms_2.13.9, loo_2.3.1.9000, and rstan_2.21.2. Unfortunately when I run:

loo1 <- brms::loo(fit1, moment_match=TRUE)

I am seeing the error:

Error in prep_call_sampler(object) :
the compiled object from C++ code for this model is invalid, possible reasons:

  • compiled with save_dso=FALSE;
  • compiled on a different platform;
  • does not exist (created from reading csv files).

When I inspect fit1 I can see that it appears to be in the correct brmsfit format, so I’m not sure what aspect of the model code is invalid. I would really appreciate any advice – thank you so much for your time on this issue!

I am sorry, this is still creating problems. Now, this is an issue in rstan or in the interaction with brms and rstan. Definitely not something you do wrong.

Would it be possible for you to provide a minimal reproducible example for me to play around with it myself?

Thanks, I’ve made a reproducible example that replicates the error from this post (Moment matching failed), although I haven’t been able to replicate the invalid model error. But this is essentially what I’m trying to do:

# R version 4.0.2
library(loo)#version 2.3.1.9000
library(brms)#version 2.13.9
library(dplyr)

df <- mtcars
df <- mutate(df,ID=c(1:nrow(df)))
f1 = brmsformula('mpg~1+wt+cyl+hp+(1+wt+cyl+hp|ID)')
f2 = brmsformula('qsec~1+wt+cyl+hp+(1+wt+cyl+hp|ID)')
f3 = brmsformula('mpg~1+wt+cyl+(1+wt+cyl|ID)')
f4 = brmsformula('qsec~1+wt+cyl+(1+wt+cyl|ID)')
model_of_interest = brm(f1+f2,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)
null_model = brm(f3+f4,data=df,iter=6000,chains=4,cores=4,family="gaussian",control=list(adapt_delta=0.9),save_all_pars=TRUE)

loo1 <- brms::loo(model_of_interest, moment_match=TRUE)
loo2 <- brms::loo(null_model, moment_match=TRUE)
1 Like

Thanks! I fixed at least 2 issues with loo_moment_match that were brms related. Depending on the model you may still need to install loo from github as well to avoid at least one further error. Does it work now with the latest brms github version?

2 Likes

Thanks the function is working now! I really appreciate the assistance.

2 Likes

@htrier, if you at some point have results to share from the comparison and how subsample and moment matching helped, please ping me as it would be really cool to know more about a successful use of these methods in real life!

2 Likes