Hello,

I’m currently attempting to model proportion data measured across numerous studies using a beta regression. The data are organized such that proportions are measured at numerous sites within a study. Here’s an example of the data:

Study | N | label | prop |
---|---|---|---|

Study1 | 1 | site | 0.9 |

Study1 | 1 | site | 0.93 |

Study1 | 1 | site | 0.89 |

Study2 | 1 | site | 0.8 |

Study2 | 1 | site | 0.82 |

Study3 | 5 | study-average | 0.7 |

If all observations were site-level within a study, I would simply run a random-intercept model to account for study-to-study variability:

```
mod.rnd <- brm( bf(prop ~ (1|Study), phi ~ (1|Study), family=Beta()), data=data)
```

However, the only observation available for Study3 is a study-level average of 5 sites and I don’t have access to the individual sites that make up that average proportion of 0.7.

What is the appropriate way to handle this type of heterogenous data in brms? My first thought was to include a nested random-intercept using the label factor:

```
mod.nst <- brm( bf(prop ~ (1|Study) + (1:Study:label), phi ~ (1|Study) + (1:Study:label)), family=Beta(), data=data)
```

The implicit nesting would simply be `(1|Study/label)`

My basic understanding of nested random effect models is this would estimate random intercepts for both factors within `label`

and would account for variability at the site and study level. However, I wanted to check to see if this is the appropriate way to handle this scenario, or if there might be a better way to handle this type of data.

Thank you very much for the help with this problem.

- Operating System: RHEL 8
- brms Version: 2.20.4