Modelling intra and inter variance for participants across conditions

Let’s say we run an experiment with two conditions. In one condition participants all agree on the rating which is typically 3 on a scale of 1-5 but in the other, there is disagreement and half the people will typically rate 2 and the other half will rate 4. I have created a simulated dataset which yields this hypothetical scenario but I a bit in doubt of how to quantify this difference.

# Inter and intra variance simulated data

# Simulate data with the below functions
simulate_data <- function(mean_value, sd_value, draws) {
  random_numbers <- rnorm(draws, mean = mean_value, sd = sd_value)
  break_points <- c(-Inf, 1, 2, 3, 4, Inf)
  category_ACR <- cut(random_numbers, breaks = break_points, labels = c("1", "2", "3", "4", "5"))
  num_ACR <- as.numeric(category_ACR)
  hist(num_ACR)
  print(paste("Mean: ", mean(num_ACR)))
  print(paste("SD: ", sd(num_ACR)))
  return(num_ACR)
}

generate_ratings <- function(norm, n) {
  return(sample(norm, n))
}

generate_data <- function() {
  data <- data.frame(Participant = integer(), Condition = character(), Rating = numeric())
  for (i in 1:50) {
    ratings1 <- generate_ratings(norm2.5, 20)
    ratings2 <- if (i <= 25) generate_ratings(norm2, 20) else generate_ratings(norm3, 20)
    data <- rbind(data, data.frame(Participant = rep(i, 40), Condition = rep(c("Condition1", "Condition2"), each = 20), Rating = c(ratings1, ratings2)))
  }
  return(data)
}

mean_value <- 2
sd_value <- 1
draws <- 10000
norm2.5 <- simulate_data(2.5, sd_value, draws*2)
norm2 <- simulate_data(2, 0.85, draws)
norm3 <- simulate_data(3, 0.85, draws)
data <- generate_data()

# BRMS
GausIndCon <- brm(
  formula = bf(Rating ~ -1 + Condition+ (Condition|Participant),
               sigma ~ Condition),
  data = data,
  family = gaussian(),
  chains = 2,
  cores = 4,
  iter = 4000,
  warmup = 1000
)

consub <- make_conditions(data, vars =c("Condition"))
conditional_effects(
  GausIndCon,
  "Participant", 
  conditions = consub,
  re_formula = NULL, ordinal = F
)

When I model this data I see that the mean rating and the variance across conditions are more or less the same:

However, when I plot each participant I can visually see the difference between the conditions easily:

I am in doubt how to quantify this difference, and in my attempt to model the sigma directly I am surprisingly being told that condition 2 has a lower sigma:
sigma_ConditionCondition2 -0.17

In the cases I wish to model I have more than 2 conditions and there is not so obvious a difference, but I believe some conditions will have more individual disagreements than others.

All tips and help on how to quantify/test this will be extremely welcome. Thanks for reading.

Howdy!

This has to do with the way that you have programmed your sim (I didn’t look at it very close) and the particular plots that you making here. Try making a plot of the raw data (this is not the prettiest plot, but it does the job):

ggplot(data, aes(x=Participant, y=Rating, color=Participant)) + geom_jitter(height=0) + facet_wrap(~Condition)


Also, check the sd in the raw data for each condition:
image

As you can see, the sd for both conditions is pretty similar. However, notice that in your model results, the estimated sd for the varying slope for Condition is around 0.5. The plots that you make above, are of the mean rating for each participant for each Condition (with partial pooling). This is not the same as the sd for each Condition. The variability that you are seeing in your second plot is described in the sd for the varying slope for Condition.

Thank you for your response. I am unsure if I misunderstood you or didn’t make the first post clear enough, but i absolutely created that simulated data with a large difference between condition 1 and 2 on purpose. We see that one group of participants rated lower in condition 2 whereas another rate higher. I want to demonstrate statistically that there is a group difference in condition 2 and not condition 1 despite them essentially having the same mean and SD. I am not sure how to do this despite the difference being visually clear.

I hope it makes sense and thank once again for trying to help :)

Maybe something like this, where you model the between-participant variance for each condition?

m_wouter <- brm(
  formula = Rating ~ -1 + Condition + (1 | Participant) + 
    (1 | gr(Participant:Condition, by = Condition)),
  data = data,
  family = gaussian(),
  chains = 2,
  cores = 4,
  iter = 4000,
  warmup = 1000
)

draws <- as_draws_df(m_wouter)
draws %>% 
  select(
    Condition1 = `sd_Participant:Condition__Intercept:ConditionCondition1`,
    Condition2 = `sd_Participant:Condition__Intercept:ConditionCondition2`
  ) %>% 
  pivot_longer(
    everything(), names_to = 'condition', values_to = 'between_participant_sd'
  ) %>% 
  ggplot(aes(between_participant_sd, condition)) +
  ggdist::stat_halfeye()

1 Like

Nice! This is very similar results as the OP original model, though, since Condition is categorical. The sd of varying intercept and slope for the original model (in the sim I ran) was 0.05 (0.00 - 0.13) and 0.53 (0.42 - 0.67) respectively. For your model the sd for varying intercepts of the participant condition interaction were 0.06 (0.00 - 0.16) and 0.52 (0.41 - 0.67) respectively.

You can use the model described by @Ax3man or you can use your model with varying intercept and varying slope for Condition. As mentioned in my response, I think the variance you are looking at is described by the sd for the varying slope for Condition.

1 Like

Thank you very much! This seems helpful and is potentially exactly what I was hoping for.

I am having a bit of trouble understanding this part, however:
(1 | gr(Participant:Condition, by = Condition))

Can you link me to some documentation or explain this in a way that I will be able to relay it to a person who knows little of BRMS?

See ?gr:

by An optional factor variable, specifying sub-populations of the groups. For each level of the by variable, a separate variance-covariance matrix will be fitted. Levels of the grouping factor must be nested in levels of the by variable.

So it’s like fitting two random effects, one for each condition. You need to use Participant:Condition in your case, since participants occur in both conditions, and the by variables has to be nested in the grouping factor.

But I think @jd_c is right that this is very similar to the model you already had, which is a more common formulation, and is likely more familiar to people.