Modelling accuracy scores with ceiling effects

LachlanC · October 23, 2021, 6:10am

Hi all, I’m trying to model the effects of an intervention on an accuracy score which was part of a cognitive test. Participants had to perform a series of arithmetic calculations and were given a score between 0% and 100% for the number of responses which were correct.

I am trying to model this data in brms but have not found a satisfying method. The first issue is that many of the participants did very well and get 100% (or close to) on the test. So there is clustering and a ceiling effect at high scores. The second issue is that some participants did quite poorly and got low scores (0-10%), which are far away from the sample mean and may end up being highly influential.

I have clumsily simulated some data:

x <- rnorm(200)

acc <- rnorm(200, mean = 1, sd = 0.2)

acc <- ifelse(acc > 1, rnorm(200, mean = 0.95, sd = 0.05), acc)

acc <- ifelse(acc > 1, 1, acc)

acc[c(1:5)] <- c(0, 0.05, 0.07, 0.2, 0.3)

d <- tibble(x = x, acc = acc)

This is what the distribution of simulated acc variable looks like:

My first thought was to fit the model using a truncated normal:

m1_trunc <- brm(acc | trunc(ub = 1) ~ x,
                data = d,
                family = gaussian(),
                prior = prior(normal(0,5), class = "b"),
                cores = 4, chains = 4,
                control = list(adapt_delta = 0.99),
                backend = "cmdstanr")

But there seemed to be issues with exploring the posterior with this model, with Rhat values >= 1.01, low ESS, and parameter estimates off target.

Skew_normal() and student() distributions on the other hand have no convergence issues and return the correct parameter value. But both do a poor job of reproducing the data:

m2_skew <- brm(acc ~ x,
               family = skew_normal(),
               prior= prior(normal(0, 100), class = "alpha"),
               control = list(adapt_delta = 0.99),
               cores = 4, chains = 4,
               data = d,
               backend = "cmdstanr")

pp_check(m2_skew, ndraws = 30)

m3_student <- brm(acc ~ x,
                  family = student(),
                  data = d,
                  chains = 4, cores = 4,
                  control = list(adapt_delta = 0.99),
                  backend = "cmdstanr")

pp_check(m3_student, ndraws = 30)

In any case, neither of these models really match the process that produced the data. I am sure that there is a more appropriate approach. In the case of reaction time data, specific models have been developed which match that data well (e.g., shifted log-normal). But I’m not aware of corresponding models in the case of accuracy scores.

How best might we model data of this kind? Would some kind of mixture be appropriate which models 1) the probability of having a perfect (or very high) score and 2) the mean accuracy score?

Thanks!

LachlanC · October 23, 2021, 6:20am

As it happens, it seems like there was a very recent post discussing a similar issue which I have missed: Non linear distribution with some zeros in data

In this case, I believe that I do have access to the number of correct responses as raw data. So my question may be solved. A natural option may be to use a binomial regression with p correct responses of n trials.

But if anyone has any other recommendations, I am all ears!

mike-lawrence · October 23, 2021, 1:30pm

Precisely! Though note that you’ll want to employ a hierarchical model, so:

brm(
    formula = acc01 ~ x + (x | participant)
    , family = binomial
)

Topic		Replies	Views
Advice for modeling for dependent variable with ceiling effect with brms Modeling brms	16	906	June 12, 2024
Model specification in brms Modeling brms	0	456	May 17, 2022
Problemas de convergência em brms Modeling brms	1	45	December 13, 2024
Advice on model structure brms	3	677	October 30, 2019
Help on brms modeling Modeling	9	433	January 22, 2023

Modelling accuracy scores with ceiling effects

Related topics