Psychometric lapsing model for more than two possible categorical outcomes

Hi,

I’m wondering whether there is a way to use brms to fit a psychometric lapsing model for a multi-response (categorical rather than bernoulli-distributed) outcome? To make this more concrete, I’m trying to analyze which of 8 English vowels (“had”, “hid”, “head”, etc.") listeners heard based on the acoustic properties of the input while also accounting for an unknown proportion of trials on which listeners have attentional lapses (or for other reasons do not respond based on the input). On those lapsing trials, I’d like to either assume that all 8 vowels have the same uniform probability (1/8) of being responded or I might try to also fit the response bias. A concise summary of what I have in mind is given on p5 of https://www.biorxiv.org/content/10.1101/260976v2.full.pdf. (fwiw, this type of data is quite common but researchers often simplify it to ‘correct’ and ‘incorrect’ answers since they have no easy way of analyzing the data if it’s not binary).

I’ve previously used both the non-linear formula syntax (gamma + (1 - gamma - lambda) * perceptual model) and mixture syntax (lapse * bias + (1-lapse) * perceptual model) to run such models for binary outcomes. But the same approach does not seem to work when there are more than two possible categorical outcomes. For example, mixture(“categorical”, “categorical”) is not allowed:

Error: Some of the families are not allowed in mixture models.

My first question is for @paul.buerkner (if you have time) to see whether there are any plans to allow the special case when all mixture components are categorical, since that would seem to be well-formed.

My second question is whether anybody has some thoughts on how to fit such a model using brms::brm(). My first thought was that one might be able to put together a multivariate model that consists of 7 models, each predicting a binary outcome(vowel 2 vs. not vowel 2; vowel 3 vs. not vowel 3; etc.) through a non-linear or mixture formula. Is that even possible in brms? And would there be a way to constrain the lapse rate to be consistent across all seven models? Any feedback or alternative suggestions would be appreciated. It’s quite likely that I’m missing something more fundamental. Thank you in advance!

The names of categorical parameters and mixture indices may clash and create disambiguities. I consider this a design flaw of mixture models that I will fix in brms 3.0, potentially breaking post-processing compatibility of some older models. Until then, I think it is safer to not allow this combination.

Thanks, Paul!

I guess my question (for others or you) is then whether the multivariate approach I outlined above is feasible. To make it more concrete, here’s a MRE that compiles. But I don’t see a way to constrain the lambda to be identical across formulas. And that seems to cause issues, unlike the MV formulation of a simple categorical model (also included below), the lapsing hack doesn’t produce reasonable estimates, probably because the lamdas across the three modeled responses are perfectly correlated (which I confirmed)?

library(tidyverse)
library(brms)

# F/Making some data without any effect
d <- 
  crossing(
    ParticipantID = 1:10,
    Trial = 1:20) %>%
  bind_cols({
    x <- as_tibble(rmultinom(nrow(.), 1, rep(1/4, 4)), .namerepair = "unique")
    names(x) <- paste("Vowel", 1:4, sep = "")
    x}) %>%
  mutate(
    Response = case_when(
      Vowel1 == 1 ~ "Vowel 1",
      Vowel2 == 1  ~ "Vowel 2",
      Vowel3 == 1  ~ "Vowel 3",
      Vowel4 == 1  ~ "Vowel 4")) %>%
  relocate(ParticipantID, Response, everything())
  
# Mixture attempt (does not work since "categorical" family not allowed in mixture models)
# fit_mix <- brm(
#   bf(
#     Response ~ 1,
#     mu1 ~ 1,
#     mu2 ~ 1, # Add random effects
#     theta1 ~ 1),
#   data = d,
#   cores = 4,
#   iter = 4000,
#   warmup = 2000,
#   family = mixture(categorical("logit"), categorical("logit")),
#   control = list(adapt_delta = .99))

# MV formulation of multinomial (works)
fit_mv <- brm(
  bf(Vowel2 ~ 1) + bernoulli("logit") +
  bf(Vowel3 ~ 1) + bernoulli("logit") +
  bf(Vowel4 ~ 1) + bernoulli("logit") +
  set_rescor(FALSE),
  data = d,
  cores = 4,
  control = list(adapt_delta = .99))

# Attempt to extend the MV formulation into lapsing model
my_priors.mv <- c(
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "lambda", resp = "Vowel2"), 
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "eta", resp = "Vowel2"),
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "lambda", resp = "Vowel3"), 
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "eta", resp = "Vowel3"),
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "lambda", resp = "Vowel4"), 
  prior(student_t(3, 0, 2.5), class = "b", nlpar = "eta", resp = "Vowel4"))

fit_mix <- brm(
  formula = 
    bf(
      Vowel2 ~ .25 + (1 - .25 - inv_logit(lambda)) * inv_logit(eta),
      eta ~ 1,
      lambda ~ 1,
      nl = T) + bernoulli("logit") +
    bf(
      Vowel3 ~ .25 + (1 - .25 - inv_logit(lambda)) * inv_logit(eta),
      eta ~ 1,
      lambda ~ 1,
      nl = T) + bernoulli("logit") +
    bf(
      Vowel4 ~ .25 + (1 - .25 - inv_logit(lambda)) * inv_logit(eta),
      eta ~ 1,
      lambda ~ 1,
      nl = T) + bernoulli("logit") +
    set_rescor(FALSE),
  prior = my_priors.mv,
  data = d,
  cores = 4,
  control = list(adapt_delta = .88))

Here’s the output of that final model:

Population-Level Effects: 
                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Vowel2_eta_Intercept        5.38      3.51     1.68    15.03 1.00     1181      772
Vowel2_lambda_Intercept     7.17      4.00     3.24    18.15 1.01      876      477
Vowel3_eta_Intercept        4.49      2.83     1.03    12.35 1.00     1235      846
Vowel3_lambda_Intercept     6.47      3.56     2.74    15.59 1.01      855      483
Vowel4_eta_Intercept        4.94      3.47     1.20    12.85 1.00      878      469
Vowel4_lambda_Intercept     7.97     10.02     2.87    23.67 1.01      390      168

If anyone has suggestions, including alternative approaches, I’d love to hear about them. Thank you!

Currently, parameters can unfortunately not be shared across univariate models in multivariate models, i.e. bf() + bf(). This feature will be implemented only in brms 3 as it requires refactoring of the underlying formula and processed-formula objects.

1 Like