Imagine I want to estimate the prevalence of a disease. We know from previous studies that the prevalence of the disease varys by sex (with men on average having a higher prevalence than women), and prevalence increases with age.
Given that the disease is quite rare, and we have some idea about overall prevalence and relative differences in prevalence between groups, how can I use this information to specify more informative priors in the model below? I am particularly interested in the ordinal age group variable - how can I encode my prior information that prevalence increases non-linearly with age? Is it possible to specify a matrix of priors encoding the varying effects within each age-sex group? An example below:
We conduct a survey and test people for the disease:
library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.13.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#>
#> ar
library(tidybayes)
#> Warning: package 'tidybayes' was built under R version 4.0.1
#>
#> Attaching package: 'tidybayes'
#> The following objects are masked from 'package:brms':
#>
#> dstudent_t, pstudent_t, qstudent_t, rstudent_t
set.seed(4)
sex <- c("M", "F")
#age groups, a=youngest, e=oldest
age <- c("a", "b", "c", "d", "e")
df <- crossing(sex=sex, age=age)
#number of people surveyed per group
df <- df %>%
mutate(n=2000)
#generate some estimates of prevalence for age-sex groups
sex_mean = tibble(sex = sex,
sex_mult = c(0.005, 0.025))
age_mean = tibble(age = age,
age_mult = c(0.8, 0.9, 1.0, 1.5, 2.0))
df <- df %>%
left_join(sex_mean) %>%
left_join(age_mean)
#> Joining, by = "sex"
#> Joining, by = "age"
#simulate some data, with one line per person surveyed
#disease: 0=no disease, 1=disease
df <- df %>%
rowwise() %>%
mutate(disease = list(
rbinom(n=n, size=1, prob = sex_mult*age_mult)
)) %>%
unnest(cols = disease)
#summarise these data to make sure they look sensible
emp <- df %>%
group_by(sex, age) %>%
summarise (n = n(),
pos = sum(disease),
prev = mean(disease))
#> `summarise()` regrouping output by 'sex' (override with `.groups` argument)
emp
#> # A tibble: 10 x 5
#> # Groups: sex [2]
#> sex age n pos prev
#> <chr> <chr> <int> <int> <dbl>
#> 1 F a 2000 34 0.017
#> 2 F b 2000 45 0.0225
#> 3 F c 2000 49 0.0245
#> 4 F d 2000 65 0.0325
#> 5 F e 2000 107 0.0535
#> 6 M a 2000 9 0.0045
#> 7 M b 2000 4 0.002
#> 8 M c 2000 7 0.0035
#> 9 M d 2000 14 0.007
#> 10 M e 2000 17 0.0085
Now construct a model to estimate prevalence for each age-sex group:
m <- brm(
disease ~ (1|sex) + (1|age),
data = df,
family = binomial,
chains = 1)
nd <- crossing(sex=sex, age=age)
nd %>%
add_fitted_draws(m) %>%
ggplot(aes(x=age, y=.value, group=sex)) +
stat_interval() +
geom_point(data=emp, aes(x=age, y=prev, group=sex)) +
scale_colour_brewer() +
facet_grid(~sex)
Created on 2020-08-10 by the reprex package (v0.3.0)
Any pointers to resources with similar examples greatly appreciated.
Thanks!