Should I report the results with random effects or not (brms, re_formula = NULL vs NA)?

I am struggling to understand should I report the results with random effects or not (re_formula = NULL or NA).

I have a population based data. Patients are nested in differently sized counties and I would like to report the result closest to the truth. N = 6000. I am interested in population-level effect only (country level).

An additive non-hierarchical model shows an effect with the narrowest CIs.

Hierarchical free intercept model (1 | county) is equal according to loo(), but shows the same effect with slightly wider confidence intervals. R2 CIs are overlapping with the previous model, however, the mean is 2% lower for the hierarchical model.

The effects of the previous models will have much wider confidence intervals while I set “re_formula = NULL” in conditional_effects(). But they still show significant effects.

Which results should I report? Or how should I decide?

PS! I also did some frequential statistics. R2 including random effects was 9% better compared to fixed effects only.

1 Like

Hey there! :)

I think the problem with a question like this is that the most reasonable answer would probably be “it depends”. And it mostly depends on your audience and what you want to convey. For a more frequentist inclined audience, they will probably view the RE as part of the composite error.

What does your model actually look like? (Is it the one from the other thread?)

This also points at not reporting the population results (which are re_formula = NULL, right?). The other option re_formula = NA would be the result for a new (not in the data set) county if I’m not mistaken.

This question goes into model selection and is not about how to report model results, right? In that case (I’m probably biased, haha) I’d always go for the hierarchical model. R2 is not the greatest way to choose between model, and if loo says they’re equivalent I’d go for the hierarchical model.

But in general, like I said before, it depend much more on your audience and what you want to convey. Maybe someone else has a more informed opinion on this?

Cheers,
Max

3 Likes

First of all, big thanks for taking time to answer!! As the answer depends, it’s still confusing :) Maybe the following answer makes deciding simpler?

The model is the same with another post. This is the one:

fit = brm(bf(received_treatment_hours ~ p1 + p2 + p3_fct + p4 + p5_fct + p6 + p7 + (1 | region), hu ~ p1 + p2 + p3_fct + p4 + p5_fct + p6 + p7 + (1 | region)), data = df, family = hurdle_lognormal(), cores = 3, chains = 3, prior = prior)

And these are the predictions for the predictor I am interested in. Study question is: is this predictor associated with outcome variable (received treatment hours) in the whole country. We are not interested in county/regional level effects.

Part A-s are for re_formula = NA and Part B-s are for re_formula = NULL; 1 - lognormal model, 2 binomial model.

combined plot

Should I report the results of panel A or panel B and why?

Then I’d say use re_formula = NA (the default for marginal_effects, right?), i.e. Panel A, because these will not include the uncertainty from the county/region levels. Panel B would answer the question of what marginal effects could look like in a new, not previously seen county/region.

In cases like this I think it helps to spell out the mathematical model to grasp the implications of those choices. (Also, you should probably think about the conditions of the marginal effect and see if those make sense. Here you need domain expertise, so I won’t be able to help with this.)

Hope this helps (at least my first paragraph, haha)!
Cheers,
Max

PS: You need to fix your plot axis’ labels ;)

1 Like

Does the post at Inferences marginal of random effects in multi-level models answer your question?

2 Likes

Thank you guys, this information is really helpful!!