Analysing repeated movement trajectories - is a GP the right approach?

I have some data where I have multiple conditions per subject (humans, in this case), who made repeated movements under these conditions. I’m interested in the variability of these movements.

The data might look something like:

subject_id cond1 cond2 x y
1 0 0 0.00, 0.01, …, 20.0 0.00, 0.7, … 18.9
1 1 0 0.0, -0.01, …, -19.7
1 1 1
2 0 0
2 1 0
2 1 1

Although each condition for each subject will have many runs, not just one. The [x, y] series can be of variable length depending on the time it took for the movement to happen (although this could be normalised to a fixed number of points, since I have definitive start and end points and can interpolate in between).

A rough visualization of what they look like:

In the spirit of McElreath’s Rethinking, I’m trying to think of a generative model for this data, but am struggling to think of a way of modelling x,y that captures the information I’m interested in: the “variability” in the movement, i.e. the variation in the trajectories. For example, if the outcome of interest was zero, all trajectories when plotted would overlap 100%. One way this could be analysed is by calculating e.g. the standard deviation of x/y within a subject for each condition at any given point in the movement, then average it / take key points of interest, and model this.

However, that doesn’t seem like a good way of actually having a model that describes the full data generating process. So I was wondering if a Gaussian process might be the right thing? i.e. fit a GP for each subject / conditions factor, and model the GP parameters such as length scale and marginal variation as functions of the condition.

However, one thing I am wary of is that I would not expect the kernel to be stationary. For example, the marginal variation is likely far higher in the middle of the movement, as the start was constrained and the movement was aimed at a fixed target. I don’t know if this is possible to model with a GP? And, if it is possible, whether it would still allow me to model the effects of experimental condition on the marginal variation / length scale?

I’d like to (ideally) fit the model with Stan (brms if possible). If someone has a case study that I could read that models a similar problem, that would be incredibly helpful. Thanks in advance for any advice!

A colleague of mine did precisely this I think a few years back for his MSc and has a repo of his code here.

Yes, you’d be doing what might be called a “heteroskedastic GP”. In brms you should be able to specify a gaussian likelihood with mu ~ gp & sigma ~ gp to achieve this (I’m not an expert in brms though, hence don’t know the exact formula to also achieve hierarchical treatment of the GP parameters).

For related code, see here and here.

Thanks, that does look very related! I’ll take a look at those links. Is the MSc thesis available anywhere? Might help me understand the code :)

Here’s the MSc, and it might also be published somewhere too.

2 Likes

Fantastic, thank you so much! Will report back with results.

1 Like