Hi,
I am trying to find a clinically sensible way to describe group difference in clinical psychological symptoms.
For example, we had 200 individuals come to the lab for testing. These are their depression scores taken from a self-report that can range between 0 and 63 (BDI).
The mean of ADHD is ~10pts higher, which has clinical meaning. But - I feel that just talking about the mean misses an important aspect here. We have what seems to me as a “higher rate” of more severe case, and that is important. The extreme cases are the most meaningful even if they are few. I fitted a y~1+group in brms with a gaussian that seems to do a good job only at capturing the mean. Also disregarding that the scores can only be between 0 and 63. Here is a pp_check:
So, my question is how would you get parameters that can have a clear interpretation regarding the rate of high cases. What I tried so far is a beta_binomial
model<-brm(bf(y | trials(100) ~ 0 + Intercept + group,
phi ~ 0 + Intercept + group),
data = df ,
warmup = 8000,
iter = 10000,
cores = 4,
chains = 4,
silent = 0,
# prior = myprior,
family = beta_binomial(link = "identity", link_phi = "identity"),
# sample_prior = "only",
backend='cmdstan')
which gave me this pp_check and posterior estimates:
Yes - it looks better, but I can’t convey the phi differences in any clinically meaningful way. I converted this to alpha/beta medians and got approximately:
alpha_adhd=1.3
alpha_control = 0.95
beta_adhd = 5.16
beta_control = 8.5
I think I can say “adhd has 1.3/0.95” more right skewness" but I’m not even sure this is mathematically correct. I ended up sampling and calculating the precent of response for each level of symptoms for each group for from the posterior median parameters:
####beta
library(ggplot2)
library(dplyr)
# Define parameters
alpha_adhd <- 1.3
beta_adhd <- 5.16
alpha_control <- 0.95
beta_control <- 8.5
n <- 1000000 # Number of simulations
max_score <- 63
# Simulate scores
set.seed(123) # For reproducibility
scores_adhd <- rbeta(n, alpha_adhd, beta_adhd) * max_score
scores_control <- rbeta(n, alpha_control, beta_control) * max_score
library(ggplot2)
library(dplyr)
library(tidyr)
# Assuming scores_adhd and scores_control have been generated as before
# Prepare the data for plotting
data <- tibble(
score = c(round(scores_adhd), round(scores_control)),
group = rep(c("ADHD", "Control"), each = n)
)
# Calculate the percentage of each score within each group
data_summary <- data %>%
group_by(group, score) %>%
summarise(n_bin = n(), .groups = 'drop') %>%
mutate(percentage = (n_bin / n) * 100) %>%
ungroup() %>%
# Ensure every possible score is represented for both groups
complete(group, score, fill = list(n = 0, percentage = 0)) %>%
arrange(group, score)
# Plot
ggplot(data_summary, aes(x = score, y = percentage, color = group)) +
geom_col(position = "dodge") +
ylim(0, 20) +
labs(title = "Probability Distribution of Depression Scores",
x = "Depression Score",
y = "Percentage Chance of Each Score") +
scale_x_continuous(breaks = seq(0, max_score, by = 5)) +
scale_y_continuous(
breaks = seq(0, 20, by = 2),
labels = function(x) paste0(x, "%"),
limits = c(0, 20)
) +
scale_color_manual(values = c("ADHD" = "blue", "Control" = "red")) +
theme_minimal() +
theme(
panel.grid.major.x = element_blank(), # Remove vertical major grid lines
panel.grid.minor.x = element_blank(), # Remove vertical minor grid lines
panel.grid.minor.y = element_blank(), # Remove horizontal minor grid lines
panel.grid.major.y = element_line(color = "grey80", size = 0.5) # Customize horizontal major grid lines
)
So this now has finally more clinical strightforward interpetation I think. We can see how much more adhd response are more probable across the self-report range. But now, I lost all the nice uncertainty estimates that bayesian inferences gives.
My questions are:
- Do you have other suggestions on how to gain parameters that will have estimates that can be clearly interpreted, specifically in regards to the heaviness of the tail.
- What will be a good way to convey the change in probability of getting a specific score between control to adhd?
I should say that I rather not work with the clinical threshold, these are really arbitrary in the end. I also love the exgaussian, but for some reason brms/stan has a really hard time sampling for that (I guess this is for a different topic).
Thank you for reading, and I will really appreciate any thought!
Nitzan