Custom likelihood for multinomial model on pooled data

E_michels · April 15, 2024, 3:38pm

Hi Folks,

For some context, I’m working with disease prevalence data in ticks. To reduce costs, ticks are sometimes ‘pooled’ and screened for a particular pathogen together. Here, the binomial likelihood function doesn’t work as it’s impossible to know if tick A, B or both are infected. I overcome this using a ‘misclassification model’ as outlined by Richard McElreath’s in his statistical rethinking lecture 17 (2023). https://www.youtube.com/watch?v=mt9WKbQJrI4&t=3224s.

Part 1 of the code bellow shows this implementation in the rethinking package. It seems to work. However, ticks carry multiple diseases. I wondered if anyone could help me to re-write a multinomial ‘equivalent’ custom likelihood function in brms for the simulated data in part 2.

Any help and advice is greatly appreciated,
Thankyou

part 1: data simulation and working ‘rethinking’ model

library(tidyverse)

simple_dat <- tibble(
  pool_id = c(rep(1:20, time = 2), 21:30)) %>% # 20 pools of two samples, 10 single samples
  mutate(intercept = 0.2) %>% # 20% positive rate
  expand_grid(site = 1:5) %>% # increase sample size by x5
  mutate(positive = rbinom(250, 1, intercept))) # generate outcomes (positive, negative)

(simple_dat_pld <- simple_dat %>% 
  group_by(pool_id, site) %>% 
  summarise(num_positive = sum(positive), 
            pool_positive = ifelse(num_positive >= 1, 1, 0)) %>% # reduce the level of information, to positive/negative for each pool
  mutate(n = ifelse(pool_id %in% 1:20, 2, 1),
         # label the different outcome:pool size combinations
         c = 1, # 1 labels samples which weren't pooled and were negative 
         c = replace(c, n == 1 & pool_positive == 1, 2), # 2 samples which weren't pooled + positive 
         c = replace(c, n == 2 & pool_positive == 0, 3), # 3 samples pooled, negatives 
         c = replace(c, n == 2 & pool_positive == 1, 4)) %>% 
  uncount(n))

# put the outcome variable, and information about pooling into a list 
simple_dat_l_pld <- list(bp = as.numeric(simple_dat_pld$pool_positive),
                         c = as.numeric(simple_dat_pld$c))

simple_pool_mod <- ulam(
  alist(
    
    # missclassification
    bp|c == 1 ~ custom(log(1 - p)), 
    bp|c == 2 ~ custom(log(p)), # this is the standard binomial likelihood, for non-pooled samples
    bp|c == 3 ~ custom(log((1-p)^2)), # the lower too are likelihoods for pooled samples 
    bp|c == 4 ~ custom(log(1 - ((1-p)^2))),
    
    logit(p) <- a,
    
    # intercept 
    a ~ dnorm(-1, 1)
    
  ), data = simple_dat_l_pld, chains = 4, cores = 4
)

post <- as.data.frame(extract.samples(simple_pool_mod))

# how well does the model predict prevalence ? 
ggplot(post) +
  geom_density(aes(x = inv_logit(a))) +
  geom_vline(xintercept = sum(simple_dat$positive)/length(simple_dat$pool_id))

Part 2: multinomial data simulation

(simple_dat <- tibble(
  pool_id = c(rep(1:20, time = 2), 21:30)) %>% 
  mutate(intercept_a = 0.1,
         intercept_b = 0.2) %>% 
  expand_grid(site = 1:5) %>% 
  mutate(positive_a = rbinom(250, 1, intercept_a),
         positive_b = rbinom(250, 1, intercept_b))) # generate outcomes (positive, negative)

(simple_dat_pld <- simple_dat %>% 
  group_by(pool_id, site) %>% 
  summarise(num_positive_a = sum(positive_a), 
            pool_positive_a = ifelse(num_positive_a >= 1, 1, 0),
            num_positive_b = sum(positive_b), 
            pool_positive_b = ifelse(num_positive_b >= 1, 1, 0)) %>% 
  mutate(n = ifelse(pool_id %in% 1:20, 2, 1),
         ca = 1,  
         ca = replace(ca, n == 1 & pool_positive_a == 1, 2), 
         ca = replace(ca, n == 2 & pool_positive_a == 0, 3), 
         ca = replace(ca, n == 2 & pool_positive_a == 1, 4),
         cb = 1,  
         cb = replace(cb, n == 1 & pool_positive_b == 1, 2), 
         cb = replace(cb, n == 2 & pool_positive_b == 0, 3), 
         cb = replace(cb, n == 2 & pool_positive_b == 1, 4)) %>% 
  uncount(n))

Topic		Replies	Views
Modeling many zeros and low values in a multinomial outcome brms techniques , specification , brms	1	757	December 8, 2022
Dirichlet-multinomial custom family in brms brms dirichlet-multinomial , brms	4	954	February 3, 2025
Mixed Multinomial Model built with brms::brm: diagnostic and goodness of fit Modeling	8	283	July 19, 2024
Issues with Parallel Computation for Custom Beta Binomial Model in brms brms fitting-issues , performance , loo , brms	0	546	November 28, 2021
Multinomial logistic regression with categorical family (brms) Modeling multinomial-response , brms	5	88	May 2, 2025

Custom likelihood for multinomial model on pooled data

part 1: data simulation and working ‘rethinking’ model

Part 2: multinomial data simulation

Related topics