Modeling an unusual case of a repeated-measures experiment

(Apologies for a non-informative title, not sure what a good title would be ).

Suppose I conduct the following 2x2 between-subjects decision making experiment with repeated measures:

  • Each subject gets assigned to one of the 2x2=4 experimental conditions.
  • Each subject then encounters 20 unique items, each of which is repeated five times, for a total of 100 trials.
  • The dependent variable is binary (correct/incorrect).

My goal is to assess the effects of the experimental manipulations, i.e. the two main effects and the interaction. Here is one way to model the experiment:

y_{i} \sim \text{Bernoulli}(\theta_{i}) \\ \text{logit}(\theta_{i}) = \alpha + \alpha_{subject[i]} + \alpha_{item[i]} + \beta_{1} x1_{i} + \beta_{2} x2_{i} + \beta_{3} x1_{i} x2_{i}

What 'm struggling with is this:

What would be an “equivalent” model if one of the experimental manipulations (say x2) was actually manipulated by using a different set of 20 unique items for each condition level. That is, the experimental manipulation would not be independent from the items. Instead, the selection of items would be what defines the manipulation. So subjects in x2-Level 1 would see items a1, a2, … a20, and subjects in x2-Level 2 would see items b1, b2, … b20.

Any ideas on how to approach this?

One approach would be to model the items a_n and b_n hierarchically and use the higher level parameters to do the hypothesis tests.

y_i \sim Bernouilli(\theta_i) \\ logit(\theta_i) = \alpha_{subject[i]} + \alpha_{item[i]} + \beta_1 x_1 + \beta_2 x_1 \alpha_{item[i]} \\ \alpha_{item[i]} \sim Normal(\mu_a, \sigma_a) \text{ for $i$ in $a_1, a_2, ..., a_{20}$} \\ \alpha_{item[i]} \sim Normal(\mu_b, \sigma_b) \text{ for $i$ in $b_1, b_2, ..., b_{20}$}

You can calculate \mu_a - \mu_b in the generated quantities block in Stan.

Thanks Stijn.

Concerning \alpha_{item}, any idea on how to code such a “dual” prior? The following is how a “plain” varying intercept could be coded. How would this be adapted?

parameters{
    vector[n_item] alpha_item;
    real           mu_item
    real<lower=0>  sigma_item;
} 
model{
    alpha_item ~ normal(mu_item, sigma_item);
    mu_item    ~ [some prior];
    sigma_item ~ [some prior];
}

EDIT: I see that I can implement this by making \alpha_{item} a matrix and combining with an index variable.

On the basis of this, here’s my current approach:

y_{i} \sim \text{Bernoulli}(\theta_{i}) \\ \text{logit}(\theta_{i}) = \alpha_{subject[i]} + \alpha_{item[i]} + \beta_{1} x1_{i} + \beta_{2} x2_{i} + \beta_{3} x1_{i} x2_{i} \\ \alpha_{item} ~\sim Normal(\alpha + 0.5 \beta_{2}, \sigma_{a}) \text{ for } i \text{ in } a_{1}, a_{2}, ..., a_{20} \\ \alpha_{item} ~\sim Normal(\alpha - 0.5 \beta_{2}, \sigma_{b}) \text{ for } i \text{ in } b_{1}, b_{2}, ..., b_{20} \\

… with the experimental manipulations x1 and x2 contrast-coded (0.5, -0.5).

I think this works but you can simplify it by doing the following. The only thing that changes is the interpretation of \beta_2. Sorry for making it more difficult than necessary in my previous post.

y_i \sim Bernouilli(\theta_i) \\ logit(\theta_i) = \alpha + \alpha_{subject[i]} + \alpha_{item[i]} + \beta_1 x_1 + \beta_2 x_2 \beta_3 x_1 x_2 \\ \alpha_{item[i]} \sim Normal(0, \sigma_a) \text{for $i$ in $a_1, a_2, ..., a_{20}$} \\ \alpha_{item[i]} \sim Normal(0, \sigma_b) \text{for $i$ in $b_1, b_2, ..., b_{20}$}

I think this parametrization will work somewhat better in Stan.

Assuming that you missed a plus sign between \beta_{2}x{2} and \beta_{3}x_{1}x{2} and assuming that the coding of the predictors x1 and x2 remains the same (+/- 0.5), the two parametrizations are mathematically the same, no? That is, the interpretation of \beta_{2} should also be the same (i.e. the main effect of x2).

So we have Version 1:

y_{i} \sim \text{Bernoulli}(\theta_{i}) \\ \text{logit}(\theta_{i}) = \alpha_{subject[i]} + \alpha_{item[i]} + \beta_{1} x1_{i} + \beta_{3} x1_{i} x2_{i} \\ \alpha_{item} ~\sim \text{Normal}(\alpha + 0.5 \beta_{2}, \sigma_{a}) \text{ for } i \text{ in } a_{1}, a_{2}, ..., a_{20} \\ \alpha_{item} ~\sim \text{Normal}(\alpha - 0.5 \beta_{2}, \sigma_{b}) \text{ for } i \text{ in } b_{1}, b_{2}, ..., b_{20} \\

And Version 2:

y_{i} \sim \text{Bernoulli}(\theta_{i}) \\ \text{logit}(\theta_{i}) = \alpha + \alpha_{subject[i]} + \alpha_{item[i]} + \beta_{1} x1_{i} + \beta_{2} x2_{i} + \beta_{3} x1_{i} x2_{i} \\ \alpha_{item} ~\sim \text{Normal}(0, \sigma_{a}) \text{ for } i \text{ in } a_{1}, a_{2}, ..., a_{20} \\ \alpha_{item} ~\sim \text{Normal}(0, \sigma_{b}) \text{ for } i \text{ in } b_{1}, b_{2}, ..., b_{20} \\

Running both of these gives essentially the same results. But actually it’s the first version that is both much faster and has higher numbers of effective parameters.

However, the estimates I get are not sensible and, apart from \sigma_{subject}, the uncertainty intervals are very wide. I’m running a simplified version of these models where I ignore x1.

Here’s the Stan code for (simplified) Version 1:

data{
    int<lower=1>                     n_observation;
    int<lower=1>                     n_subject;
    int<lower=1>                     n_item;
    int<lower=1, upper=n_subject>    subject[n_observation];
    int<lower=1, upper=n_item>       item[n_observation];
    int<lower=1, upper=2>            x1_i[n_observation];
    int<lower=0, upper=1>            y[n_observation];
}

parameters{
  real                alpha;
  vector[n_subject]   alpha_subject;
  real<lower=0>       sigma_subject;
  matrix[n_item, 2]   alpha_item;
  vector<lower=0>[2]  sigma_item;
  real                beta_x1;
}

transformed parameters{
  vector[n_observation] y_hat;

  for (i in 1:n_observation)
    y_hat[i] = alpha_subject[subject[i]] + alpha_item[item[i], x1_i[i]];
}

model{
  y               ~ bernoulli_logit(y_hat);
  alpha           ~ normal(1, 1);
  alpha_subject   ~ normal(0, sigma_subject);
  sigma_subject   ~ normal(0, 1);
  alpha_item[, 1] ~ normal(alpha + 0.5*beta_x1, sigma_item[1]);
  alpha_item[, 2] ~ normal(alpha - 0.5*beta_x1, sigma_item[2]);
  sigma_item      ~ normal(0, 3);
  beta_x1         ~ normal(0, 1);
}

And for Version 2:

data{
    int<lower=1>                     n_observation;
    int<lower=1>                     n_subject;
    int<lower=1>                     n_item;
    int<lower=1, upper=n_subject>    subject[n_observation];
    int<lower=0, upper=n_item>       item[n_observation];
    real<lower=-0.5, upper=0.5>      x1[n_observation];
    int<lower=1, upper=2>            x1_i[n_observation];
    int<lower=0, upper=1>            y[n_observation];
}

parameters{
  real                alpha;
  vector[n_subject]   alpha_subject;
  real<lower=0>       sigma_subject;
  matrix[n_item, 2]   alpha_item;
  vector<lower=0>[2]  sigma_item;
  real                beta_x1;
}

transformed parameters{
  vector[n_observation] y_hat;

  for (i in 1:n_observation)
    y_hat[i] = alpha + alpha_subject[subject[i]] + alpha_item[item[i], x1_i[i]] +
               beta_x1*x1[i];
}

model{
  y               ~ bernoulli_logit(y_hat);
  alpha           ~ normal(1, 1);
  alpha_subject   ~ normal(0, sigma_subject);
  sigma_subject   ~ normal(0, 1);
  alpha_item[, 1] ~ normal(0, sigma_item[1]);
  alpha_item[, 2] ~ normal(0, sigma_item[2]);
  sigma_item      ~ normal(0, 3);
  beta_x1         ~ normal(0, 1);
}

And idea what the issue is?

After a quick scan I could not find any differences between the two parametrizations. Can you post the results and any warnings for divergences?

There are no warnings, and the Rhat and the n_eff look good.

Here are the results of an attempt to do parameter recovery using Version 1 from above (blue = posterior distribution, red = true value):

And the R code that produces the fake data:

library(dplyr)

info <- data.frame(
  n_subject         = 100,
  n_item            = 25,
  n_rep             = 5,
  # parameters true values
  alpha             = 1,
  alpha_subject_sd  = 1,
  alpha_item1_sd    = 2,
  alpha_item2_sd    = 2,
  b_x2              = 0.5
)

fake_items <- data.frame(
    item        = rep(1:info$n_item, 2),
    x2          = rep(c('x2_1', 'x2_2'), each =  info$n_item),
    alpha_item  = c(
      rnorm(info$n_item, info$alpha + 0.5*info$b_x2, info$alpha_item1_sd),
      rnorm(info$n_item, info$alpha - 0.5*info$b_x2, info$alpha_item2_sd)
    )
)

fake_subjects <- data.frame(
    subject       = 1:info$n_subject,
    alpha_subject = rnorm(info$n_subject, 0, info$alpha_subject_sd)
)

fake <- data.frame(
  subject = rep(1:info$n_subject, each = info$n_item*info$n_rep),
  x2      = rep(c('x2_1', 'x2_2'), each = info$n_subject*info$n_item*info$n_rep/2),
  item    = rep(rep(1:info$n_item, each = info$n_rep), info$n_subject)
  ) %>%
  inner_join(fake_subjects) %>%
  inner_join(fake_items) %>%
  mutate(
    yhat     = alpha + alpha_subject + alpha_item,
    accuracy = rbinom(n(), 1, exp(yhat) / (exp(yhat) + 1))
  )

A couple of quick things in the simulation.

  • In the one but last line, you probably need info$alpha. If you still had an alpha = 2 somewhere in your R environment that could explain the outlier like result for alpha.
  • The sd for the items is quite high for ‘only’ 25 items. It means that the s.e. for the mean of each group of items is \frac{2}{\sqrt{15}}=.4 which can quickly overwhelm the effect of interest i.e. beta_x1. You might need a couple of simulation runs to check whether your model is doing what you want.

Otherwise, I think the Stan model looks fine.

1 Like

Thanks Stijn. Good catch re: alpha, although this is not responsible for the observed behavior. I’ll do more extensive simulations to try and see what’s going on.