Handling data with reported population- and group-level observations

zult · April 24, 2024, 4:13pm

Hello,

I’m currently attempting to model proportion data measured across numerous studies using a beta regression. The data are organized such that proportions are measured at numerous sites within a study. Here’s an example of the data:

Study	N	label	prop
Study1	1	site	0.9
Study1	1	site	0.93
Study1	1	site	0.89
Study2	1	site	0.8
Study2	1	site	0.82
Study3	5	study-average	0.7

If all observations were site-level within a study, I would simply run a random-intercept model to account for study-to-study variability:

mod.rnd <- brm( bf(prop ~ (1|Study), phi ~ (1|Study), family=Beta()), data=data)

However, the only observation available for Study3 is a study-level average of 5 sites and I don’t have access to the individual sites that make up that average proportion of 0.7.

What is the appropriate way to handle this type of heterogenous data in brms? My first thought was to include a nested random-intercept using the label factor:

mod.nst <- brm( bf(prop ~ (1|Study) + (1:Study:label), phi ~ (1|Study) + (1:Study:label)), family=Beta(), data=data)

The implicit nesting would simply be (1|Study/label)

My basic understanding of nested random effect models is this would estimate random intercepts for both factors within label and would account for variability at the site and study level. However, I wanted to check to see if this is the appropriate way to handle this scenario, or if there might be a better way to handle this type of data.

Thank you very much for the help with this problem.

Operating System: RHEL 8
brms Version: 2.20.4

zacho · June 11, 2024, 9:50pm

hey @zult , did you ever find out anything here?
(I’ve been following this thread for a while because I’m interested in approaches to the problem.)
My first thought was that you’re describing a meta-analysis model but where you also (more often?) have access to the primary data sometimes (rather than the summary stat at the study level) – and so naturally I thought of brms’s functionality for adding se() on the response side of the formula, where the se encodes variation at the study level. However, even if that did work for mixed primary/study-level data, you wouldn’t be able to use the Beta() family.
All that to say: I dunno!
My gut says this must be possible somehow though.

I’m not sure if your random effects formula really respects the fact that the primary vs. study-level observations (site vs. average) should carry different weights (e.g., maybe one study average is worth 3 observations at site level), but it’s probably something to start with.

zult · July 9, 2024, 8:13pm

Thanks for thinking it over @zacho. Unfortunately, I didn’t come up with anything very satisfying and decided to just proceed with the original formulation:

mod.rnd <- brm( bf(prop ~ (1|Study), phi ~ (1|Study), family=Beta()), data=data)

Most of the time, I only have one or two study-average studies within 100s of individual site observations. I figure this approach at least updates the mean of prop given the site-average even though that observation won’t contribute to the estimate of the group-level effects.

One other option I tried was using the weights to at least count that mean N times:

mod.rnd <- brm( bf(prop | weights(N) ~ (1|Study), phi ~ (1|Study), family=Beta()), data=data)

From what I understand, this will have the study-average mean contribute N times to the likelihood. Unfortunately, this resulted in excess divergences that I wasn’t able to iron out.

All that to say, I’m just going with the original approach to not let perfect be the enemy of good. I’m hoping it’s okay for this dataset, but it’s definitely something that I’d be interested in figuring out down the road. Sorry for the unsatisfying answer there.

zacho · July 9, 2024, 8:54pm

not let perfect be the enemy of good

I hear ya. At least it sounds like your data is dominated by individual observations, so your 1st model should do pretty well.
Didn’t someone quip that “a model is never finished, merely abandoned”…

Topic		Replies	Views
Help with (partially?) nested data/model brms specification , hierarchical-model , brms	3	1443	March 16, 2022
Varying effects crossed vs. nested in brms brms specification	4	1132	April 14, 2021
Multilevel model with single-observation group levels brms specification	3	1264	May 24, 2020
Question regarding the handling of missing data in brms brms specification	3	714	June 1, 2021
Multilevel (mixed effects) model with zero-inflated beta family in brms brms rstan	3	2708	April 15, 2019

Handling data with reported population- and group-level observations

Related topics