Auto-grouping option in brms' gp()

Hi all / @paul.buerkner ,

I wonder whether the brms Gaussian Process (GP) support through gp actually supports hierarchical GPs when setting gr=T.

What I mean by hierarchical GP:

Let \vec{M} and \vec{0} be m-dimensional.

So we have

\vec{M} \sim \mathcal{GP}(\vec{0}, \mathbf{K}_1)

where \mathbf{K}_1 is m\times m dimensional. This is what I would call the population mean GP. Now for each of the N individuals we have that the observation \vec{y}_i (also m-dimensional) of individual i is i.i.d. from the following GP:

\vec{y}_i \sim \mathcal{GP}(\vec{M}, \mathbf{K}_2)

where \mathbf{K}_2 is another m\times m kernel. Now suppose each individual has observations for the same m-dimensional vector of time points, couldn’t one then do in brms

fit <- brms(y ~ gp(time,gr=T) + gp(time,gr=F), data=df)

and the term gp(time,gr=T) would correspond to the population GP with kernel \mathbf{K}_1 and gp(time) to the individual GP with kernel \mathbf{K}_2.

I think this would indeed imply a hierachical GP in your formulation. You could try running the model, but I would expect it to have serious convergence issues to the the GPs fighting over the same variance. For this to work reliably we may need to joint priors over the GPs, which are currently not implemented. See also

Thanks Paul. In order to get closer to an understanding of what such a joint prior would need to look like, could you help me collecting the properties it needs to fulfil more precisely (e.g. which pathologies it needs to avoid)?

  • I understand that in the above additive 2-level setting one can “freely” add a constant varying vector \vec{d} to each second-level GP realisation (that is to each \vec{y}_i) and add the negative of it to the first-level GP realisation \vec{M}.

Think of it as a multilevel model. If you had an overall intercept and an individual intercept per group without any hard or soft-centering (the latter via priors) we will run into linear dependence and non-idenfitied models. The same will presumably happen here. I am not sure about the best solution. Likely people have worked on this (@avehtari?). If not, this is actually a relevant research topic I would say.

1 Like

One option is decov priors used in rstanarm
See also Bayesian functional ANOVA modeling using Gaussian process prior distributions

1 Like

Thank you!

That is something like that for the two marginal standard deviations corresponding to the two exponentiated square kernels?

\\ ...
data {
 vector<lower=0>[2] concentration;
parameters {
  real<lower=0> tau;
  simplex[2] pi;
transformed parameters {
  real<lower=0> alpha1;
  real<lower=0> alpha2;
model {
    pi ~ dirichlet(concentration);
    tau ~ gamma(1,1);