# Latent Gaussian processes for repeated measures

I am interested in implementing latent Gaussian processes for repeated measures on multiple subjects. That is, each subject has their own little 1-dimensional time series, not necessarily aligned or with the same spacing as the other subjects. The subjects fall into 2 or more (mutually-exclusive) groups and I’d like each group to have its own Gaussian process parameters, so we can investigate whether the length-scale, amplitude etc differs between groups.

The idea is based on a model presented in https://doi.org/10.1111/bmsp.12180 - though there they fit the model using a kind of EM algorithm.

I know I can fit (latent) Gaussian processes in brms and I have an understanding how this basic principle can be sped up with the basis function approximation. But a model of the form `y ~ gp(t, by = group) + (1 | id)` doesn’t seem appropriate because it would appear to treat responses from different subjects but with similar timestamps as ‘close together’ if they are within the same group. At the same time, doing `by = id` would fit a completely separate GP for every subject, which seems inefficient and would not pool the parameters quite how I imagine it.

In other words: if I have 5 subjects in group A and 5 subjects in group B, then I want a ‘separate’ GP for every subject, but only two different sets of parameters: one for group A and one for group B. And maybe a measurement-error parameter `sigma` that’s common to all subjects.

Does this make sense? Has it been done before? What would be a good way to go about doing it?

Here is something perhaps close to what I’m after. A keyword I didn’t mention above was ‘multi-level’.

That particular example assumes every subject/trial has the same number of observations and `x` indices, so I’ll have to code up something more complicated. Or maybe there’s a clever trick for doing it with `brms` and `by = id` (to take advantage of the faster approximate implementation) with some sort of priors on the GP parameters.

I know I’m replying to myself, but I’ve found a recent paper on just this topic

and an associated R package `gppm`. It’s just using the L-BFGS sampler and doesn’t appear to handle non-Gaussian likelihoods, but is close enough to adapt to my needs.

Here’s old code for hierarchical GPs I posted ages ago; but I don’t think anything is so out-of-date that it won’t work. Though maybe see here for a nicer GP parameterization you might merge with the hierarchical structure instead.

1 Like

@Selby did you end up going with the `gppm` package, or did you churn something out in Stan? I found this paper to be useful in showing how hierarchical GPs would be modeled, but the structure of the data is such that all observations are recorded at the same time.

I am working with data from multiple plots and within each plot, I want to model the spatial autocorrelation between observations. The number of observations and locations are different for each plot. Also, some of the response variables are counts, so based on what you said, `gppm` wouldn’t work. Also, sorry for bringing up a zombie thread!

I haven’t come up with a complete solution yet as I was working on other projects. The latest iteration is something in Stan, based on the code within `gppm`, but it’s very slow at the moment so doesn’t scale well (unless I use `optimizing` instead of `sampling` but that doesn’t feel very satisfactory).