Help conceptualizing a model

Hi there :) I have a vision for a model but I have doubts about its feasibility.

I ran two tasks:
in task A subjs answered whether the stimuli were words or not and I am interested in their accuracy (bernoulli link-function)
in task B subjs answered what’s the ink color of the stimuli and I am interested in their reaction times (lognormal link-function).

Both tasks had three conditions (Stroop’s congruent, neutral and incongruent) and all subjs did all the tasks.

Ultimately, I am interested in predicting the accuracy of task A with the reaction times of task B and with the conditions as another predictor.

I can create two models and extract the effects (or random effects) of task B and use these summarized observations as predictors, but it feels off. I’m worried that will make me lose a lot of information. Any suggestions/directions for which model I should use?

Thanks!

1 Like

Rather than modeling the reaction times of B, you could just use those reaction times as un-modeled covariates for predicting A. This would require both A and B had the same three conditions.

3 Likes

Dear Solomon,

Thank you so much for your reply!

Nevertheless, I’m not sure I understand what you meant.

Do you suggest something like this?

set.seed(111)
library(brms)
library(tidyverse)

Task.A <- data.frame(
  Subjs     = rep(LETTERS[1:5],6),
  Condition = rep(c("Congruent","Neutral","Incongruent"),30),
  Correct   = sample(c(1,0),size = 30, replace = T)
)

Task.B <- data.frame(
  Subjs     = rep(LETTERS[1:5],6),
  Condition = rep(c("Congruent","Neutral","Incongruent"),30),
  RT        = rnorm(30, mean = 250, sd = 50)
)

Merge.Tasks <- left_join(Task.A,Task.B, by = c("Subjs","Condition"))

brm( 
  data = Merge.Tasks,
  family = bernoulli,
  formula = Correct ~ Condition + RT + (Condition + RT|Subjs))
2 Likes

In essence, yes, that’s what I’m talking about. Here are a few related models to consider:

formula = Correct ~ 0 + Condition + RT + (0 + Condition + RT | Subjs)

formula = Correct ~ Condition + RT + Condition:RT + (Condition + RT + Condition:RT | Subjs)

formula = Correct ~ 0 + Condition + RT + Condition:RT + (0 + Condition + RT + Condition:RT | Subjs)
3 Likes

Thank you Solomon!

There are two issues which keep my mind busy and I’d appreciate your input:

  1. Isn’t problematic to assign reaction times to the dependent variable in a somewhat random manner?
    For example in this case:
Merge.Tasks %>%
   filter(Subjs=="A") %>%
   group_by(Correct) %>%
     summarise(Mean = mean(RT))
 A tibble: 2 x 2
  Correct  Mean
    <dbl> <dbl>
1       0  197.
2       1  262.

Incorrect answers are predicted (in average) with faster reaction times even though the reaction times were collected from a different task.

  1. What should I do in case of different amount of trials in each task, specifically less trials from Task B (reaction times task)

Thanks again!

2 Likes

If RT is continuous, I don’t think you can enter it as a varying slope.

You could try a multivariate approach and then examine the correlation between outcomes.

bf_correct <- bf(correct ~ Condition + (Condition|p|Subjs), family = bernoulli())
bf_rt <- bf(RT ~ Condition + (Condition|p|Subjs), family = Gamma())
fit <- brm(bf_correct + bf_rt, data = data)

3 Likes

Q1

Let’s walk this out a bit to make sure we’re on the same page. In the case of Merge.Tasks, you have two blocks of Stroop tasks, which are differentiated by whether they have Correct or RT as the DV. However, the Stroop tasks are related in that those DVs are from 36 trials within each of the 3 levels of Condition, yielding 36 * 3 = 108 trials for each subject. But since you have two kinds of Strop tasks, there are 108 * 2 = 216 total trials for each subject. Now I see your point that it would be odd to use the RT values from individual trials as predictors. The individual trials aren’t tightly linked between the DVs. I have two possible solutions:

You could just take the average RT within each of the three levels of Condition, for each subject. Thirty-six is a decent number of trials to compute a population estimate of a subject’s true reaction time. You could even account for uncertainty by computing the standard error for that mean and use the brms::me() function (see here).

Alternatively, you could use the 2-step model-based solution you alluded to in your initial post. To attenuate the loss of information, you could use the brms::me() function there, too. But this time you’d enter in the posterior mean and standard deviation of the random effects, rather than the sample statistics. Either way, you’d be relying on brms::me() to capture the uncertainty in your predictor.

Q2

As far as I can tell, both of the two solutions, above, would be robust to differences in trial numbers between the DVs. But as always, more trials are better than fewer.

2 Likes

This is awesome. I was not familiar with brms::me(). Thank you so much!!

2 Likes

Thank you @JLC! To be honest I am still quite a beginner and I never tried multivariate approaches. After looking at the more complex models of this vignette, I still can’t really see how this procedure will results in the data of bf_correct controlled/predicted (or something similar to controlled/predicted) by the data of bf_rt. I would be thankful for some elaboration and/or sources.

1 Like

My apologies @S_H !

I didn’t look closely enough at the Merge.Tasks data frame and (incorrectly) assumed RT was different for every trial.

@Solomon is a brms ninja and his approach looks great.

If you wanted to model RT and use the modelled values as predictors, I think you would have to use brms' nonlinear syntax.

2 Likes

I actually think that if I understand the problem correctly, @JLC’s approach is worth considering (although @Solomon’s might in the end prove better). The important part here is the (Condition|p|Subjs) term that appears in both formulas. This is somewhat convoluted syntax but it can be understood as:

  • For both responses there is an unobserved “ability” of each subject to have correct answers in one task and have long reaction times in the other task (modelled as a varying effect)
  • Those abilities are correlated between the tasks, so e.g. latent “ability” to respond correctly in the congruent condition is correlated with the “ability” to have long reaction times in the congruent condition.
  • The amount and sign of correlation is estimated from data.

So here, the correlations between the subjects varying effects for the two tasks serve a similar purpose as the coefficients of the reaction times in @Solomon’s approach. IMHO the main conceptual difference is that when using the (average of) the reaction times to predict correct answers the predictors are on the response scale of the reaction time submodel and are directly influenced by the measurement noise (but not much since you are averaging). In the multivariate model, you look at correlations between variables on the latent linear scale. One would thus expect that inspecting the fitted correlations should give similar qualitative results as inspecting the fitted coefficients when using RT as a predictor.

A practical difference is also that correlations in my (limited) experience usually require more data to fit well than fixed effect coefficients.

Also, since the data from the two tasks don’t correspond directly to each other you can’t directly feed the data to brms. But I think you could do something like (code not tested):

Merged2 <- rbind(Task.A %>% mutate(RT = NA, isA = TRUE), 
                 Task.B %>% mutate(Correct = NA, isA = FALSE))

bf_correct <- bf(Correct | subset(isA) ~ Condition + (Condition|p|Subjs), family = bernoulli())
bf_rt <- bf(RT | subset(!isA) ~ Condition + (Condition|p|Subjs), family = "gamma")
fit <- brm(bf_correct + bf_rt + set_rescor(FALSE), data = Merged2)

Where the subset term tells brms to actually use only some rows for the relevant submodel. I am speculating here a bit, so feel free to ask for clarifications if you’d like to implement the model but encoutner difficulties.

EDIT: I just tested that the code above and after slight modifications it leads to a model that compiles and runs, although the simulated dataset is giving divergences (possibly because it is too small).

Best of luck with your model!

3 Likes

If y’all can get the multivariate model up and running, I think that would be a great approach.

1 Like