Gaussian process model for several functional observations with common mean

I’m trying to specify a Gaussian process model for several functional observations with shared mean and observation-level random effects. The model I have coded is a bit unwieldy since all the functional observations are stored in a vector. It would be nicer if I could have them in a matrix with each row belonging to one observation, but the shared mean makes this difficult for me to figure out how to do it.

I have several functional observations y_i, i=1,..,N over a common grid of time values t. The model is

y_i(t) = \mu(t) + \theta_i + \epsilon_i(t),

\theta_i \sim N(0, \sigma^2_\theta),

\mu \sim GP(0,K_{\tau_1,l_1}),

\epsilon_i \sim GP(0,K_{\tau_2,l_2} + \sigma^2 \delta),

where K are squared exponential covariance functions and \delta the identity function. Observations have covariance

Cov(y_i(t),y_j(t')) = K_\mu(t,t') + \delta_{ij}K_\epsilon(t,t') + \delta(t,t')\sigma^2.

That is, each observation has a common mean \mu(t), a structured error \epsilon_i(t) that is autocorrelated within observations and independent between observations, and unstructured error common to account for noise in measurement.

The priors are irrelevant to my problem.

My model block is

data {
  int<lower=1> n_obs; // number of functional observations
  int<lower=1> len; // length of each functional observation
  int ID[(n_obs*len)]; // ID of each observational unit
  vector[(n_obs*len)] y; // Functional observations
  real x[(n_obs*len)]; // Time values of each observation
}
transformed data {
  int N = n_obs*len; // Total number of data points
}
parameters {
  real<lower=0> l1;
  real<lower=0> t1;
  real<lower=0> l2;
  real<lower=0> t2;
  real<lower=0> sigma;
  real<lower=0> sigma.theta;
  vector[n_obs] theta;
}

model {
  matrix[N, N] L_K;
  matrix[N, N] K = cov_exp_quad(x, t1, l1); // Covariance matrix
  real sq_sigma = square(sigma); // Unstructured error variance

  // Add unstructured error covariance
  for (i in 1:N)
    K[i, i] = K[i, i] + sq_sigma;
    
  // Add structured error covariance
  for (i in 1:n_obs){
    int a = (i-1)*len + 1;
    int b = i*len;
    K[a:b, a:b] = K[a:b, a:b] + cov_exp_quad(x[1:len], x[1:len], t2, l2);
  }

  L_K = cholesky_decompose(K);

  l1 ~ inv_gamma(5, 5);
  t1 ~ std_normal();
  l2 ~ inv_gamma(5, 5);
  t2 ~ std_normal();
  sigma ~ std_normal();
  tau ~ std_normal();

  for(i in 1:n_obs){
    theta[i] ~ normal(0,tau);
  }

  y ~ multi_normal_cholesky(theta[ID], L_K);
}

Hey @PeteyCoco! Welcome to the forums. I hope the community can get you the answers you need. Take my answers “with a grain of salt”, meaning I’m not an expert.

So looking at the additive model you’ve specified:

y(t) = \mu(t) + \theta_i + \epsilon_i(t)

Specifying functional model with more than one observation, we would format the input data like this:

vector[N] x[T];

We can think of this as an T \times N matrix where each row is a time point, and each column is an observation. If N=1, a “1-D” GP (in Stan code, the instantiation is vector x[T];), then the GP(0, K(x, x')) would render a function that spans (or intersects) each time point, where K(x, x') is a T \times T covariance matrix.

But based on your data block, since X, Y both have a “length of functional observations” (or number of time points) and a “number of observations” I might restructure it so that your data block looks something like this:

data {
  int<lower=1> n_obs;
  int<lower=1> len;    // this is what I call "T" above
  vector[n_obs] y[len];
  vector[n_obs] x[len];
}
// no transformed parameters block needed

Then, building the covariance matrix is more or less straight forward. First, remember that the sum of any two covariance matrices is still a valid covariance matrix. So I’ll build two covariance matrices, then sum them up: Let \mu(t) := K_{\mu} and \epsilon(t) = K_{\epsilon} be the two covariance matrices in the additive model.

We can do something similar to what you’ve done above for K_{\mu}. However, we don’t need to multiply the output to get big “N”, we simply have a “len by len” or T \times T covariance matrix. But first, we need to handle the oberservation level intercept. I think it would work to declare a parameter of length “n_obs” and then add it to each column of the data matrix, then construct the covariance matrix as usual. So for K_{\mu} (however I’m not sure if Stan will like which block I’ve declared these in, this is close to pseudo code):

parameters {
// rest same as above
  vector[n_obs] theta;
}
model {
  matrix[len, len] K;
  matrix[len, len] K_mu;
  matrix[len, len] K_epsilon;
  matrix[len, n_obs] x_temp;
  matrix[len, len] L_K;
  real sq_sigma = square(sigma);
  for (i in 1:n_obs) {
    // add each theta_i sequentially to each column in x
    x_temp[, i] = x[,i] + theta[i]; // this is in R pseudo code
  }
  // now building K_mu:
  K_mu = cov_exp_quad(x_temp, t1, l1);
  for (i in 1:len) {
    K_mu[i, i] = K_mu[i, i] + sq_sigma;
  }  

}

For the \epsilon_i \sim GP(o, K + \sigma^2 \delta), it would look similar, where we sum the two covariance matrices for the full covariance matrix at the end:

... // previous stuff ^^^
K_epsilon = cov_exp_quad(x_new, t2, l2);
for (i in 1:len) {
   K_epsilon[i, i] = K_epsilon[i,i] + sigma_sq_2 * delta; // provided we've defined delta, sig2
}
K = K_epsilon + K_mu;
L_K = cholesky_decompose(K);

And then since we have a multi-output/multivariate GP

...
  for (i in 1:len) 
   target += multi_normal_cholesky(y[i] | rep_vector(0, len), L_K);
  // this indexing may need a closer look
}

I hope this is what you were looking for? I haven’t tested this, so there’s likely syntax errors, and I’d hope that you test this on simulated data and recover parameters before any real experiments. Please let me know if it gives you any trouble.

Aside:
This can be vectorized:

  for(i in 1:n_obs){
    theta[i] ~ normal(0,tau);
  }

to:

theta ~ normal(0, tau)

provided you’re happy with iid variance assumption on each theta.

Andre

1 Like

Thanks for the help Andre, you mentioned some useful things I didn’t know about. However, I failed to mention a critical detail in my model statement. I just added this:

Observations have covariance

Cov(y_i(t),y_j(t')) = K_\mu(t,t') + \delta_{ij}K_\epsilon(t,t') + \delta_{ij}\delta(t,t')\sigma^2.

That is, each observation has a common mean \mu(t), a structured error \epsilon_i(t) that is autocorrelated within observations independent between observations, and unstructured error to account for noise in measurements.

The code that I showed produces the correct results on simulated data, but it feels clunky. Namely the for-loop where I add the block diagonal structured error covariance terms:

model{
//...
  // Add structured error covariance
  for (i in 1:n_obs){
    int a = (i-1)*len + 1;
    int b = i*len;
    K[a:b, a:b] = K[a:b, a:b] + cov_exp_quad(x[1:len], x[1:len], t2, l2);
  }
//...
}

I’m going to try to incorporate the array of vectors that you use for the data to clean things up a bit. It may be that there isn’t a clean way to code this, but I was looking to see if anyone had ideas!

Based on this, I don’t think my suggestion to create an T \times N matrix will work, as any covariance structure you impose on the T \times T covariance will be the same across all observations.

So now I’m beginning to see why you’ve implemented it the way you did. I don’t think there’s another way to do it. You have to specify explicitly between each observation the autocorrelation is 0 (or a block of zeros).

Edit:

If you want observations to be modeled independently, why are you including them in the same model? You can just use separate models for each observation. I don’t think there’s any benefit if including them in the same model.

1 Like