Hi everyone,
I am a fairly new STAN-convert, and would like some help specifying a model. I feel like this one should be textbook as it is fairly standard analysis in in Neuroscience but while I’ve found several bits of information here and there, I am still struggling to make all of the pieces fit together.
Background
I am analysing the results of an EEG experiment, during which 30 subjects performed ~100 trials (that lasted 10 seconds each) of a behavioural task.
For each trial/time_point, I am collecting a dependent variable (score)
The experimental conditions were:
- subject:
1, ..., 30
- trial_difficulty:
EASY
,HARD
- time of measurement within the trial:
0s
,2s
,4s
,6s
My data looks like this:
subject condition time_point trial_number score
1 EASY 0s 1 +0.53
1 EASY 2s 1 +0.32
1 EASY 4s 1 +0.12
1 EASY 6s 1 +0.05
1 HARD 0s 2 +0.26
1 HARD 2s 2 -0.19
...
Notes
- Conditions were randomized for each subject across trials.
- Each subject performed about 100 trials, but the dataset is not fully balanced as there were more EASY trials than HARD trials.
- Similarity scores vary in the [-2;+2] interval.
Scientific questions
I would like to answer the following two scientific questions:
- Is there an effect of trial difficulty (EASY vs HARD)?
- Is this effect present for all time points ?
Model specification questions
I have tried to find inspiration in this excellent tutorial paper by Sorensen and colleagues [1] which is a bit similar to my paradigm.
However, I am struggling to properly take into account the time
variable, and specifically that there will be a serial correlation between consecutive time windows for a given trial.
STAN model specification attempt
Here is what I have attempted so far, based on the example in [1].
Also, here I am afraid I am neglecting the serial correlation between time points.
data {
int<lower=1> N; //number of data points
real<lower=-2, upper=2> scores[N]; //scores
real<lower=-1, upper=1> difficulty[N]; //predictor
int<lower=1> J; //number of subjects
int<lower=1> K; //number of time points per trial
int<lower=1, upper=J> sub[N]; //subject id
int<lower=0, upper=6> win[N]; //time window
}
parameters {
vector[2] beta;
real<lower=0> sigma_e;
matrix[2,J] u;
vector<lower=0>[2] sigma_u;
matrix[2, K] w; //window intercepts, slopes
real<lower=0> sigma_w; //window sd
}
model {
real mu;
//priors
u[1] ~ normal(0, sigma_u[1]); //subj intercepts
u[2] ~ normal(0, sigma_u[2]); //subj slopes
w[1] ~ normal(0, sigma_w[1]); //time intercepts
w[2] ~ normal(0, sigma_w[2]); //time slopes
//likelihood
for (i in 1:N){
mu = beta[1] + u[1, sub[i]] + w[1, win[i]] +
(beta[2] + u[2, sub[i]] + w[2, win[i]]) * difficulty[i];
scores[i] ~ normal(mu, sigma_e);
}
}
Any help or pointers you can give to steer this in the right direction will be greatly appreciated.
Best,
Marco
References
[1] Sorensen, T., Hohenstein, S., & Vasishth, S. (2016). Bayesian linear mixed models using Stan: A tutorial for psychologists, linguists, and cognitive scientists. The Quantitative Methods for Psychology , 12 (3), 175–200. doi:10.20982/tqmp.12.3.p175
[2] https://www.r-bloggers.com/fitting-bayesian-linear-mixed-models-for-continuous-and-binary-data-using-stan-a-quick-tutorial/