Setting better priors for ordinal categorical variables

petermacp · August 10, 2020, 5:51pm

Imagine I want to estimate the prevalence of a disease. We know from previous studies that the prevalence of the disease varys by sex (with men on average having a higher prevalence than women), and prevalence increases with age.

Given that the disease is quite rare, and we have some idea about overall prevalence and relative differences in prevalence between groups, how can I use this information to specify more informative priors in the model below? I am particularly interested in the ordinal age group variable - how can I encode my prior information that prevalence increases non-linearly with age? Is it possible to specify a matrix of priors encoding the varying effects within each age-sex group? An example below:

We conduct a survey and test people for the disease:

library(tidyverse)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.13.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(tidybayes)
#> Warning: package 'tidybayes' was built under R version 4.0.1
#> 
#> Attaching package: 'tidybayes'
#> The following objects are masked from 'package:brms':
#> 
#>     dstudent_t, pstudent_t, qstudent_t, rstudent_t

set.seed(4)

sex <- c("M", "F")
#age groups, a=youngest, e=oldest
age <- c("a", "b", "c", "d", "e")

df <- crossing(sex=sex, age=age)

#number of people surveyed per group
df <- df %>%
  mutate(n=2000)

#generate some estimates of prevalence for age-sex groups
sex_mean = tibble(sex = sex,
                  sex_mult = c(0.005, 0.025))

age_mean = tibble(age = age,
                  age_mult = c(0.8, 0.9, 1.0, 1.5, 2.0))

df <- df %>%
  left_join(sex_mean) %>%
  left_join(age_mean)
#> Joining, by = "sex"
#> Joining, by = "age"

#simulate some data, with one line per person surveyed
#disease: 0=no disease, 1=disease
df <- df %>%
  rowwise() %>%
  mutate(disease = list(
    rbinom(n=n, size=1, prob = sex_mult*age_mult)
  )) %>%
  unnest(cols = disease)

#summarise these data to make sure they look sensible
emp <- df %>%
  group_by(sex, age) %>%
  summarise (n = n(),
             pos = sum(disease),
             prev = mean(disease))
#> `summarise()` regrouping output by 'sex' (override with `.groups` argument)

emp
#> # A tibble: 10 x 5
#> # Groups:   sex [2]
#>    sex   age       n   pos   prev
#>    <chr> <chr> <int> <int>  <dbl>
#>  1 F     a      2000    34 0.017 
#>  2 F     b      2000    45 0.0225
#>  3 F     c      2000    49 0.0245
#>  4 F     d      2000    65 0.0325
#>  5 F     e      2000   107 0.0535
#>  6 M     a      2000     9 0.0045
#>  7 M     b      2000     4 0.002 
#>  8 M     c      2000     7 0.0035
#>  9 M     d      2000    14 0.007 
#> 10 M     e      2000    17 0.0085

Now construct a model to estimate prevalence for each age-sex group:

m <- brm(
   disease ~ (1|sex) + (1|age),
   data = df,
   family = binomial,
   chains = 1)


nd <- crossing(sex=sex, age=age)

nd %>%
  add_fitted_draws(m) %>%
  ggplot(aes(x=age, y=.value, group=sex)) +
  stat_interval() +
  geom_point(data=emp, aes(x=age, y=prev, group=sex)) +
  scale_colour_brewer() +
  facet_grid(~sex)

^{Created on 2020-08-10 by the reprex package (v0.3.0)}

Any pointers to resources with similar examples greatly appreciated.

Thanks!

martinmodrak · August 16, 2020, 4:57pm

This is relatively straightforward, I think the “monotonic effects” feature of brms should have you covered: Estimating Monotonic Effects with brms

I think you can do this by specifying the coef argument in the set_prior family of functions (but you need to provide a prior separately for each coefficient), check out the docs and get_prior…

Does that answer your question?

Best of luck with modelling!

petermacp · August 16, 2020, 7:03pm

Thank you so much - the link to the “monotonic effects” was exactly what I was looking for. Much appreciated.

Topic		Replies	Views
Setting Different Priors on Category-Specific Coefficients brms	1	603	February 4, 2019
Setting priors and interpreting ordered logistic model brms	10	1720	December 30, 2021
Issues with hierarchical ordinal model (brms) brms posterior-predictive , brms	1	235	February 10, 2024
Specifying Priors for Ordinal (adjacent category) Model brms specification , prior-choice	32	2128	May 13, 2020
Setting priors with some assumptions for ordinal cumulative probit brms model Modeling ordinal-response , priors , brms	5	457	March 17, 2024

Setting better priors for ordinal categorical variables

Related topics