Help with a factorial, repeated-measures design

Hi everyone,

I am a fairly new STAN-convert, and would like some help specifying a model. I feel like this one should be textbook as it is fairly standard analysis in in Neuroscience but while I’ve found several bits of information here and there, I am still struggling to make all of the pieces fit together.

Background

I am analysing the results of an EEG experiment, during which 30 subjects performed ~100 trials (that lasted 10 seconds each) of a behavioural task.

For each trial/time_point, I am collecting a dependent variable (score)

The experimental conditions were:

  • subject: 1, ..., 30
  • trial_difficulty: EASY, HARD
  • time of measurement within the trial: 0s, 2s, 4s, 6s

My data looks like this:

subject condition time_point trial_number score
1       EASY      0s          1           +0.53
1       EASY      2s          1           +0.32
1       EASY      4s          1           +0.12
1       EASY      6s          1           +0.05
1       HARD      0s          2           +0.26
1       HARD      2s          2           -0.19
...

Notes

  • Conditions were randomized for each subject across trials.
  • Each subject performed about 100 trials, but the dataset is not fully balanced as there were more EASY trials than HARD trials.
  • Similarity scores vary in the [-2;+2] interval.

Scientific questions

I would like to answer the following two scientific questions:

  1. Is there an effect of trial difficulty (EASY vs HARD)?
  2. Is this effect present for all time points ?

Model specification questions

I have tried to find inspiration in this excellent tutorial paper by Sorensen and colleagues [1] which is a bit similar to my paradigm.

However, I am struggling to properly take into account the time variable, and specifically that there will be a serial correlation between consecutive time windows for a given trial.

STAN model specification attempt

Here is what I have attempted so far, based on the example in [1].

Also, here I am afraid I am neglecting the serial correlation between time points.

data {
    int<lower=1> N;                         //number of data points
    real<lower=-2, upper=2> scores[N];      //scores
    real<lower=-1, upper=1> difficulty[N];  //predictor
    int<lower=1> J;                         //number of subjects
    int<lower=1> K;                         //number of time points per trial
    int<lower=1, upper=J> sub[N];           //subject id
    int<lower=0, upper=6> win[N];           //time window
}

parameters {
    vector[2] beta;
    real<lower=0> sigma_e;
    matrix[2,J] u;
    vector<lower=0>[2] sigma_u;
    matrix[2, K] w;                 //window intercepts, slopes
    real<lower=0> sigma_w;          //window sd
}

model {
    real mu;

    //priors
    u[1] ~ normal(0, sigma_u[1]);   //subj intercepts
    u[2] ~ normal(0, sigma_u[2]);   //subj slopes
    w[1] ~ normal(0, sigma_w[1]);   //time intercepts
    w[2] ~ normal(0, sigma_w[2]);   //time slopes

    //likelihood
    for (i in 1:N){
        mu = beta[1] + u[1, sub[i]] + w[1, win[i]] + 
            (beta[2] + u[2, sub[i]] + w[2, win[i]]) * difficulty[i];
        scores[i] ~ normal(mu, sigma_e);
    }
}

Any help or pointers you can give to steer this in the right direction will be greatly appreciated.

Best,
Marco

References

[1] Sorensen, T., Hohenstein, S., & Vasishth, S. (2016). Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists. The Quantitative Methods for Psychology , 12 (3), 175–200. doi:10.20982/tqmp.12.3.p175
[2] https://www.r-bloggers.com/fitting-bayesian-linear-mixed-models-for-continuous-and-binary-data-using-stan-a-quick-tutorial/

I am not sure I understand your issues - I can see at least two possible interpretations:

  1. You expect that the score changes monotonically with time_point - you might want to have a look on how brms implements monotonic effects: https://cran.r-project.org/web/packages/brms/vignettes/brms_monotonic.html

  2. Values for consecutive time windows will be more similar than for time points further apart, but you can’t expect any firm structure. Than splines might be a good answer. Again brms has an implementation you might want to use. (The best example I could find quickly is https://www.fromthebottomoftheheap.net/2018/04/21/fitting-gams-with-brms/), but you can also check the docs: https://rdrr.io/cran/brms/man/s.html example in pure stan is https://mc-stan.org/users/documentation/case-studies/splines_in_stan.html

In both cases you may find it easier to just write your model in brms than to develop your own stan code as your use case is likely to be easily within the capabilities of brms.

1 Like