Fit Bayesian hierarchical model with three levels using brms

Hi all,

This is my first time to use brms for Bayesian hierarchical model, please excuse me if my questions are stupid.

I have a data frame with the outcome of participants (the test result of if a participant gets infected or not, say n participants in total), age group (8 age groups in total), and regions (9 regions in total). 0 and 1 indicate the test result is negative or positive. Now I want to fit a Bayesian hierarchical model on the probability of being positive in each age group in each region. So my model has three levels, 9 regions, each region has 8 age groups, and each age group has a certain number of participants. Note that the number of participants in each age group in each region is not exactly the same, but very similar.

The mathematical expression of my model would be:
y_{ijk} \sim Bernoulli (1, p_{ij})
logit(p_{ij}) = \alpha + \alpha_{age[i,j]} + \alpha_{region[j]}
\alpha_{age_{[i,j]}} \sim Normal(0, \sigma_{age})
\alpha_{region_{[i]}} \sim Normal(0, \sigma_{region})
\alpha \sim Normal(0,1)
\sigma_{age} \sim HalfCauchy(0,1)
\sigma_{region} \sim HalfCauchy(0,1)

y_{ijk}: the test result of the k-th individual in the i-th age group in the j-th region, where i = 1, …, 8, j = 1 and k = 1, … , n
\alpha_{age[i,j]}: i-th age group in j-th region
\alpha_{region[i]}: j-th region

In brms, ignore priors for now, I write the fomula as:

brm (outcome ~ 1 + (1 | region / age)).

Based on what I descbribed above, I have two following questions:

  1. Did I build correct model mathematically? If so, is there any terms can be added in the model to improve the predictive perfomance?

  2. Does the brm formula I used correctly reflect the mathematical model? If not, how should I change the formula?

Any comments and suggestions would be much appreciated!

Hi, sorry it took a while to answer. This,

is the same as (1 | region) + (1 | region:age). Is that what you want?

Have you considered using (1 + region | age), or for that matter (1 + age | region), and compare the models with LOO to look at the relative out of sample prediction performance of the models? It all depends on what questions you want to ask the model.

Hi Torkar,

Many thanks for the response!

I understand (1 | region / age) is equivalent to (1 | region) + (1 | region:age), but I am not sure that if (1 | region / age) is the right formula to reflect my equation image

I want to build a model that takes into account the dependency structures existing in the data, so I think the model should be in a nested structure, just like this screenshot:

I read the paper Advanced Bayesian Multilevel Modeling with brms, it says “The /
operator indicates nested grouping structures” in page 4, that’s why I use (1 | region / age) in the formula.

So I think my question would be finding the right formula to reflect the mathematical equation. Any advice would be much appreciated!

Best,
Vivi.

I think a nested structure is sane in your case (but I don’t have the background knowledge you have here!) I would propose to set up several competing models and look at the strength and weaknesses (use LOO to do model comparison). If your research question doesn’t require a nested model (it seems it does), then adding that complexity might not be necessary (however, it can still do better out of sample predictions!) :)