Setting better priors for ordinal categorical variables

Imagine I want to estimate the prevalence of a disease. We know from previous studies that the prevalence of the disease varys by sex (with men on average having a higher prevalence than women), and prevalence increases with age.

Given that the disease is quite rare, and we have some idea about overall prevalence and relative differences in prevalence between groups, how can I use this information to specify more informative priors in the model below? I am particularly interested in the ordinal age group variable - how can I encode my prior information that prevalence increases non-linearly with age? Is it possible to specify a matrix of priors encoding the varying effects within each age-sex group? An example below:

We conduct a survey and test people for the disease:

library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.13.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tidybayes)
#> Warning: package 'tidybayes' was built under R version 4.0.1
#> 
#> Attaching package: 'tidybayes'
#> The following objects are masked from 'package:brms':
#> 
#>     dstudent_t, pstudent_t, qstudent_t, rstudent_t

set.seed(4)

sex <- c("M", "F")
#age groups, a=youngest, e=oldest
age <- c("a", "b", "c", "d", "e")

df <- crossing(sex=sex, age=age)

#number of people surveyed per group
df <- df %>%
  mutate(n=2000)

#generate some estimates of prevalence for age-sex groups
sex_mean = tibble(sex = sex,
                  sex_mult = c(0.005, 0.025))

age_mean = tibble(age = age,
                  age_mult = c(0.8, 0.9, 1.0, 1.5, 2.0))

df <- df %>%
  left_join(sex_mean) %>%
  left_join(age_mean)
#> Joining, by = "sex"
#> Joining, by = "age"

#simulate some data, with one line per person surveyed
#disease: 0=no disease, 1=disease
df <- df %>%
  rowwise() %>%
  mutate(disease = list(
    rbinom(n=n, size=1, prob = sex_mult*age_mult)
  )) %>%
  unnest(cols = disease)

#summarise these data to make sure they look sensible
emp <- df %>%
  group_by(sex, age) %>%
  summarise (n = n(),
             pos = sum(disease),
             prev = mean(disease))
#> `summarise()` regrouping output by 'sex' (override with `.groups` argument)

emp
#> # A tibble: 10 x 5
#> # Groups:   sex [2]
#>    sex   age       n   pos   prev
#>    <chr> <chr> <int> <int>  <dbl>
#>  1 F     a      2000    34 0.017 
#>  2 F     b      2000    45 0.0225
#>  3 F     c      2000    49 0.0245
#>  4 F     d      2000    65 0.0325
#>  5 F     e      2000   107 0.0535
#>  6 M     a      2000     9 0.0045
#>  7 M     b      2000     4 0.002 
#>  8 M     c      2000     7 0.0035
#>  9 M     d      2000    14 0.007 
#> 10 M     e      2000    17 0.0085

Now construct a model to estimate prevalence for each age-sex group:

m <- brm(
   disease ~ (1|sex) + (1|age),
   data = df,
   family = binomial,
   chains = 1)


nd <- crossing(sex=sex, age=age)

nd %>%
  add_fitted_draws(m) %>%
  ggplot(aes(x=age, y=.value, group=sex)) +
  stat_interval() +
  geom_point(data=emp, aes(x=age, y=prev, group=sex)) +
  scale_colour_brewer() +
  facet_grid(~sex)

Created on 2020-08-10 by the reprex package (v0.3.0)

Any pointers to resources with similar examples greatly appreciated.

Thanks!

2 Likes

This is relatively straightforward, I think the “monotonic effects” feature of brms should have you covered: https://cran.r-project.org/web/packages/brms/vignettes/brms_monotonic.html

I think you can do this by specifying the coef argument in the set_prior family of functions (but you need to provide a prior separately for each coefficient), check out the docs and get_prior…

Does that answer your question?

Best of luck with modelling!

2 Likes

Thank you so much - the link to the “monotonic effects” was exactly what I was looking for. Much appreciated.

2 Likes