I have some data where I have multiple conditions per subject (humans, in this case), who made repeated movements under these conditions. I’m interested in the variability of these movements.
The data might look something like:
subject_id | cond1 | cond2 | x | y |
---|---|---|---|---|
1 | 0 | 0 | 0.00, 0.01, …, 20.0 | 0.00, 0.7, … 18.9 |
1 | 1 | 0 | 0.0, -0.01, …, -19.7 | … |
1 | 1 | 1 | … | … |
2 | 0 | 0 | … | … |
2 | 1 | 0 | … | … |
2 | 1 | 1 | … | … |
… | … | … | … | … |
Although each condition for each subject will have many runs, not just one. The [x, y] series can be of variable length depending on the time it took for the movement to happen (although this could be normalised to a fixed number of points, since I have definitive start and end points and can interpolate in between).
A rough visualization of what they look like:
In the spirit of McElreath’s Rethinking, I’m trying to think of a generative model for this data, but am struggling to think of a way of modelling x,y that captures the information I’m interested in: the “variability” in the movement, i.e. the variation in the trajectories. For example, if the outcome of interest was zero, all trajectories when plotted would overlap 100%. One way this could be analysed is by calculating e.g. the standard deviation of x/y within a subject for each condition at any given point in the movement, then average it / take key points of interest, and model this.
However, that doesn’t seem like a good way of actually having a model that describes the full data generating process. So I was wondering if a Gaussian process might be the right thing? i.e. fit a GP for each subject / conditions factor, and model the GP parameters such as length scale and marginal variation as functions of the condition.
However, one thing I am wary of is that I would not expect the kernel to be stationary. For example, the marginal variation is likely far higher in the middle of the movement, as the start was constrained and the movement was aimed at a fixed target. I don’t know if this is possible to model with a GP? And, if it is possible, whether it would still allow me to model the effects of experimental condition on the marginal variation / length scale?
I’d like to (ideally) fit the model with Stan (brms
if possible). If someone has a case study that I could read that models a similar problem, that would be incredibly helpful. Thanks in advance for any advice!