As stated in the subject, I wonder whether there is a meaningful way to compare two models with the different number of observations? I am trying `loo(modelA, modelB). Models are like these:
modelA <- brm(DV ~
(A + B + C) * D +
(1|item) +
(1|subject),
data = dat,
chains=4, iter=4000, cores=12,
control=list(adapt_delta=.95))
modelB <- brm(DV ~
(A + B + C) * D +
(A + B + C) * E +
(A + B + C) * F +
(1|item) +
(1|subject),
data = dat,
chains=4, iter=4000, cores=12,
control=list(adapt_delta=.95))
modelA = add_criterion(modelA, 'loo')
modelB = add_criterion(modelAB 'loo')
loo(modelA, modelB)
This comparison returns the error.
Operating System: Linux
brms Version: 2.12.0
Tbh, I also do not understand the message. NAs? Or?
Could you provide the error message? My first guess(if you’re getting different data sizes) is that you have some missing values in E and F; that would cause the row to be dropped in model B but not model A.
Also, I don’t think that brms gets any benefit from having more cores than chains (unless a recent version has added some of Stan’s new within chain parallelization features). That shouldn’t really have an effect, but it’s worth keeping in mind.
Yea with the way it is set up it won’t use any of the cores past the number of chains. The threads parameter can be added to the model call which will use reduce_sum, but I haven’t used it.
@striatum I’m not sure if this is what you meant when you said “a different number of observations” but if you’re using a different data set or only part of it for one model then loo will give you an error because it doesn’t make sense to compare them.
If some of your observations are missing/NA for the E and F parameters then BRMS will drop those observations in the second model which would give you the “different number of observations error”. You’ll have to deal with the missing/NA values before you will be able to compare the models using loo.
After rereading your title, I don’t think there are really any ways to compare models with different number of observations. You will have to deal with the NAs in some way or another to use any of the common model comparison methods because most of them do something like some over some function of all the observations, so they aren’t comparable.
Something I guess you could do would be to calculate the bayesian R2 value for each model and compare, but that has it’s own problems and might lead you astray depending on how much data you have and how much is missing values. Someone else with more experience probably has a better answer than me though.