Is there a meaningful way to compare models with different number of observations

striatum · January 21, 2021, 8:15pm

As stated in the subject, I wonder whether there is a meaningful way to compare two models with the different number of observations? I am trying `loo(modelA, modelB). Models are like these:

modelA <- brm(DV ~
    (A + B + C) * D +
    (1|item) +
    (1|subject),
    data = dat,
    chains=4, iter=4000, cores=12,
    control=list(adapt_delta=.95))

modelB <- brm(DV ~
    (A + B + C) * D +
    (A + B + C) * E +
    (A + B + C) * F +
    (1|item) +
    (1|subject),
    data = dat,
    chains=4, iter=4000, cores=12,
    control=list(adapt_delta=.95))

modelA = add_criterion(modelA, 'loo')
modelB = add_criterion(modelAB 'loo')

loo(modelA, modelB)

This comparison returns the error.

Operating System: Linux
brms Version: 2.12.0

Tbh, I also do not understand the message. NAs? Or?

Tnx

Christopher-Peterson · January 21, 2021, 10:12pm

Could you provide the error message? My first guess(if you’re getting different data sizes) is that you have some missing values in E and F; that would cause the row to be dropped in model B but not model A.

Also, I don’t think that brms gets any benefit from having more cores than chains (unless a recent version has added some of Stan’s new within chain parallelization features). That shouldn’t really have an effect, but it’s worth keeping in mind.

Reece_W · January 21, 2021, 11:29pm

Yea with the way it is set up it won’t use any of the cores past the number of chains. The threads parameter can be added to the model call which will use reduce_sum, but I haven’t used it.

@striatum I’m not sure if this is what you meant when you said “a different number of observations” but if you’re using a different data set or only part of it for one model then loo will give you an error because it doesn’t make sense to compare them.

If some of your observations are missing/NA for the E and F parameters then BRMS will drop those observations in the second model which would give you the “different number of observations error”. You’ll have to deal with the missing/NA values before you will be able to compare the models using loo.

Reece_W · January 22, 2021, 12:08am

After rereading your title, I don’t think there are really any ways to compare models with different number of observations. You will have to deal with the NAs in some way or another to use any of the common model comparison methods because most of them do something like some over some function of all the observations, so they aren’t comparable.

Something I guess you could do would be to calculate the bayesian R2 value for each model and compare, but that has it’s own problems and might lead you astray depending on how much data you have and how much is missing values. Someone else with more experience probably has a better answer than me though.

striatum · January 26, 2021, 8:28pm

It was just a missing data issue. Once I tidy up my data (e.g., na.omit(dat)), everything ran smoothly. Thanks!

striatum · January 26, 2021, 8:29pm

Thanks for your elaborate reply! na.omit() did the job. Silly me… Thanks again!

Topic		Replies	Views
Comparing loo for two complex multivariate models with large data brms loo	9	1151	October 1, 2020
Speeding up multiple loo model comparisons in brms by using multiple cores? brms loo	7	1960	August 2, 2018
Determine number of parameters in brms GAMM to compare to p_loo value Modeling loo , brms	4	1346	August 12, 2021
How to speed up `brms::loo_subsample()` for large models brms loo , hierarchical-model , model-comparison	16	1559	October 25, 2022
Checking model and LOO with models of different observations Modeling fitting-issues , performance , loo , brms	4	530	May 7, 2021

Is there a meaningful way to compare models with different number of observations

Related topics