Hilbert space Gaussian process for multiple time series

Hi everyone,

I have been using Gaussian processes to model the north-south migration of GPS-tagged animals over time. Due to the number of observations (from several hundred to several thousand obs. per individual, meaning around 40000 obs. in total) I have been using the Hilbert space basis function approximation.

Currently, my most elaborate model is a heteroskedastic GP with a slow trend and yearly cyclic trend (adapting the code from here and here) - this model samples without issue.

As a next step, I would like to extend the model to incorporate the individual ID of each animal - although individuals generally follow a similar migration pattern, there is variation in the exact timing and overall distance that they migrate. I am thinking that a way to model this would be some sort of hierarchical GP, a bit like the one in this tutorial. - each individual would have its own GP (planning to use Matérn 1/2 kernel for these) , but would contribute to an overall average migration trajectory (which would presumably be modelled by the slow trend + cyclic GP as in my current model).

However, I am struggling to code a Hilbert basis function version of this type of model. In particular, it’s unclear to me exactly how the indexing would work - presumably I’d need to have separate basis functions and spectral densities for each individual, and combine these using append_row in normal_id_glm? Am I thinking along the right lines here, or am I barking up the wrong tree?

To further complicate things, the GPS data for each individual have different start and end points (not all individuals were tagged at once, tags were collected at different times) and thus quite different lengths. Also, data from different individuals may be spaced differently (e.g. some GPS tags return daily fixes, while others only return weekly fixes).

Any help with this problem would be greatly appreciated!

Heteroskedastic model code (no individual ID):

functions{
  
  vector diagSPD_EQ(real alpha, real rho, real L, int M) {
    return sqrt((alpha) * sqrt(2*pi()) * rho * exp(-0.5*(rho*pi()/2/L)^2 * linspaced_vector(M, 1, M)^2));
  }
  
  vector diagSPD_periodic(real alpha, real rho, int M) {
    real a = 1/rho^2;
    array[M] int one_to_M;
    for (m in 1:M) one_to_M[m] = m;
    vector[M] q = sqrt(alpha * 2 / exp(a) * to_vector(modified_bessel_first_kind(one_to_M, a)));
    return append_row(q,q);
  }

  matrix PHI(int N, int M, real L, vector x) {
    return sin(diag_post_multiply(rep_matrix(pi()/(2*L) * (x+L), M), linspaced_vector(M, 1, M)))/sqrt(L);
  }
  
  matrix PHI_periodic(int N, int M, real w0, vector x) {
    matrix[N,M] mw0x = diag_post_multiply(rep_matrix(w0*x, M), linspaced_vector(M, 1, M));
    return append_col(cos(mw0x), sin(mw0x));
  }
  
}

data{
  int N; // Number of observations
  array[N] real y; // Northing
  array[N] real x; // Time (day)

  
  real<lower=0> c_f1;  // factor c to determine the boundary value L
  int<lower=1> M_f1;   // number of basis functions for smooth function
  
  int<lower=1> J_f2;   // number of cos and sin functions for periodic function
  
  real<lower=0> c_g1;  // factor c to determine the boundary value L
  int<lower=1> M_g1;   // number of basis functions for smooth function

}

transformed data {
  // Standardize data
  real xmean = mean(x);
  real ymean = mean(y);
  real xsd = sd(x);
  real ysd = sd(y);
  vector[N] xn = (to_vector(x) - xmean)/xsd;
  vector[N] yn = (to_vector(y) - ymean)/ysd;
  

  // Basis functions for f1
  real L_f1 = c_f1*max(xn);
  matrix[N,M_f1] PHI_f1 = PHI(N, M_f1, L_f1, xn);
  
  // Basis functions for f2
  real period_year = 365.25/xsd;
  matrix[N,2*J_f2] PHI_f2 = PHI_periodic(N, J_f2, 2*pi()/period_year, xn);
  
  // Basis functions for g1
  real L_g1 = c_g1*max(xn);
  matrix[N,M_g1] PHI_g1 = PHI(N, M_g1, L_g1, xn);
  
  // Concatenated basis functions
  matrix[N,M_f1+2*J_f2] PHI_f = append_col(PHI_f1, PHI_f2);
}

parameters {
  vector[M_f1] beta_f1;         // the basis functions coefficients for f1
  vector[2*J_f2] beta_f2;       // the basis functions coefficients for f2
  vector[M_g1] beta_g1;         // the basis functions coefficients for g1
  real<lower=0> lengthscale_f1; // lengthscale for f1
  real<lower=0> lengthscale_f2; // lengthscale for f2
  real<lower=0> lengthscale_g1; // lengthscale for g1

  real<lower=0> sigma_f1;       // scale of f1
  real<lower=0> sigma_f2;       // scale of f2
  real<lower=0> sigma_g1;       // scale of g1
  
  real intercept_g; 
}

model {
  // spectral densities for f1, f2, and g1
  vector[M_f1] diagSPD_f1 = diagSPD_EQ(sigma_f1, lengthscale_f1, L_f1, M_f1);
  vector[2*J_f2] diagSPD_f2 = diagSPD_periodic(sigma_f2, lengthscale_f2, J_f2);

  vector[M_f1] diagSPD_g1 = diagSPD_EQ(sigma_g1, lengthscale_g1, L_g1, M_g1);


  // priors
  beta_f1 ~ normal(0, 1);
  beta_f2 ~ normal(0, 1);
  beta_g1 ~ normal(0, 1);

  lengthscale_f1 ~ exponential(1);
  lengthscale_f2 ~ normal(0, 0.1);
  lengthscale_g1 ~ exponential(1);

  sigma_f1 ~ exponential(1);
  sigma_f2 ~ exponential(1);
    
  sigma_g1 ~ exponential(1);
  intercept_g ~ normal(0, 0.1);

  // Model
  yn ~ normal_id_glm(PHI_f,
		     0.0,
		     append_row(diagSPD_f1 .* beta_f1, diagSPD_f2 .* beta_f2),
		     exp(intercept_g + PHI_g1 * (diagSPD_g1 .* beta_g1))); 
}

generated quantities {
  vector[N] f1;
  vector[N] f2;
  vector[N] f;
  vector[N] sigma;
  vector[N] log_lik;
  {
    // spectral densities for f1, f2, and g1
    vector[M_f1] diagSPD_f1 = diagSPD_EQ(sigma_f1, lengthscale_f1, L_f1, M_f1);
    vector[2*J_f2] diagSPD_f2 = diagSPD_periodic(sigma_f2, lengthscale_f2, J_f2);
      vector[M_f1] diagSPD_g1 = diagSPD_EQ(sigma_g1, lengthscale_g1, L_g1, M_g1);

    // functions scaled back to original scale
    f1 = (0.0 + PHI_f1 * (diagSPD_f1 .* beta_f1))*ysd;
    f2 = (PHI_f2 * (diagSPD_f2 .* beta_f2))*ysd;
    f = f1 + f2 + ymean;
    sigma = exp(intercept_g + PHI_g1 * (diagSPD_g1 .* beta_g1))*ysd;
    
    // log_liks for loo
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | f[n], sigma[n]);
  }
}

Update:

I now have this model, in which I’m treating individual animal ID (AID) like the day-of-week effect in the Birthdays case study, except using the Matérn 1/2 kernel for the beta_AID effect. This isn’t quite what I’m after yet, as I’d like each individual to have a separate Matérn 1/2 kernel, rather than one kernel shared among all individuals, but I haven’t quite figured out how to code this yet.

The model so far:

functions{
  
  vector diagSPD_EQ(real alpha, real rho, real L, int M) {
    return sqrt((alpha) * sqrt(2*pi()) * rho * exp(-0.5*(rho*pi()/2/L)^2 * linspaced_vector(M, 1, M)^2));
  }
  
  vector diagSPD_periodic(real alpha, real rho, int M) {
    real a = 1/rho^2;
    array[M] int one_to_M;
    for (m in 1:M) one_to_M[m] = m;
    vector[M] q = sqrt(alpha * 2 / exp(a) * to_vector(modified_bessel_first_kind(one_to_M, a)));
    return append_row(q,q);
  }
  
  vector diagSPD_Matern12(real alpha, real rho, real L, int M) {
    vector[M] indices = linspaced_vector(M, 1, M);
    real factor = 2;
    vector[M] denom = rho * ((1 / rho)^2 + pow(pi() / 2 / L * indices, 2));
    return alpha * sqrt(factor * inv(denom));
  }
  
  matrix PHI(int N, int M, real L, vector x) {
    return sin(diag_post_multiply(rep_matrix(pi()/(2*L) * (x+L), M), linspaced_vector(M, 1, M)))/sqrt(L);
  }
  
  matrix PHI_periodic(int N, int M, real w0, vector x) {
    matrix[N,M] mw0x = diag_post_multiply(rep_matrix(w0*x, M), linspaced_vector(M, 1, M));
    return append_col(cos(mw0x), sin(mw0x));
  }
  
}

data{
  int N; // Number of observations
  array[N] real y; // Northing
  array[N] real x; // Time (days)
  
  array[N] int AID; // Animal ID for each observation
  int N_animals; // Total number of animals
  
  real<lower=0> c_f1;  // factor c to determine the boundary value L
  int<lower=1> M_f1;   // number of basis functions for smooth function
  
  int<lower=1> J_f2;   // number of cos and sin functions for periodic function
  
  real<lower=0> c_f3;  // factor c to determine the boundary value L
  int<lower=1> M_f3;   // number of basis functions for smooth function
}

transformed data {
  // Normalize data
  real xmean = mean(x);
  real ymean = mean(y);
  real xsd = sd(x);
  real ysd = sd(y);
  vector[N] xn = (to_vector(x) - xmean)/xsd;
  vector[N] yn = (to_vector(y) - ymean)/ysd;
  
  
  // Basis functions for f1
  real L_f1 = c_f1*max(xn);
  matrix[N,M_f1] PHI_f1 = PHI(N, M_f1, L_f1, xn);
  
  // Basis functions for f2
  real period_year = 365.25/xsd;
  matrix[N,2*J_f2] PHI_f2 = PHI_periodic(N, J_f2, 2*pi()/period_year, xn);
  
  // Basis functions for f3
  real L_f3= c_f3*max(xn);
  matrix[N,M_f3] PHI_f3 = PHI(N, M_f3, L_f3, xn);
  
  // Concatenated basis functions - f1 and f2
  matrix[N,M_f1+2*J_f2] PHI_f = append_col(PHI_f1, PHI_f2);
}

parameters {
  vector[M_f1] beta_f1;         // the basis functions coefficients for f1
  vector[2*J_f2] beta_f2;       // the basis functions coefficients for f2
  vector[M_f3] beta_f3;         // the basis functions coefficients for f1
  
  vector[N_animals] beta_AID;    // individual-level offset
  
  real<lower=0> lengthscale_f1; // lengthscale for f1
  real<lower=0> lengthscale_f2; // lengthscale for f2
  real<lower=0> lengthscale_f3; // lengthscale for f3

  real<lower=0> sigma_f1;       // scale of f1
  real<lower=0> sigma_f2;       // scale of f2
  real<lower=0> sigma_f3;       // scale of f3
  
  real<lower=0> sigma;          // residual scale  
}

model {
  // spectral densities for f1, f2, and f3
  vector[M_f1] diagSPD_f1 = diagSPD_EQ(sigma_f1, lengthscale_f1, L_f1, M_f1);
  vector[2*J_f2] diagSPD_f2 = diagSPD_periodic(sigma_f2, lengthscale_f2, J_f2);
  vector[M_f3] diagSPD_f3 = diagSPD_Matern12(sigma_f3, lengthscale_f3, L_f3, M_f3);
  
  // priors
  beta_f1 ~ normal(0, 1);
  beta_f2 ~ normal(0, 1);
  beta_f3 ~ normal(0, 1);
  beta_AID ~ normal(0, 0.1);

  lengthscale_f1 ~ exponential(1);
  lengthscale_f2 ~ normal(0, 0.1);
  lengthscale_f3 ~ exponential(1);
  
  sigma_f1 ~ exponential(1);
  sigma_f2 ~ exponential(1);
  sigma_f3 ~ exponential(0.5);
  
  sigma ~ exponential(1);

  // Individual-level offset
  vector[N] f3 = PHI_f3 * (diagSPD_f3 .* beta_f3);
  vector[N] intercept = 0.0 + f3 .* beta_AID[AID];
  
  // Model
  yn ~ normal_id_glm(PHI_f,
                     intercept,
                     append_row(diagSPD_f1 .* beta_f1, diagSPD_f2 .* beta_f2),
                     sigma); 
}

generated quantities {
  vector[N] f1;
  vector[N] f2;
  vector[N] f3;
  vector[N] f;
  vector[N] log_lik;
  {
    // spectral densities for f1, f2, and f3
    vector[M_f1] diagSPD_f1 = diagSPD_EQ(sigma_f1, lengthscale_f1, L_f1, M_f1);
    vector[2*J_f2] diagSPD_f2 = diagSPD_periodic(sigma_f2, lengthscale_f2, J_f2);
    vector[M_f3] diagSPD_f3 = diagSPD_Matern12(sigma_f3, lengthscale_f3, L_f3, M_f3);

    // functions scaled back to original scale
    f1 = (0.0 + PHI_f1 * (diagSPD_f1 .* beta_f1))*ysd;
    f2 = (PHI_f2 * (diagSPD_f2 .* beta_f2))*ysd;
    f3 = ((PHI_f3 * (diagSPD_f3 .* beta_f3)) .* beta_AID[AID])*ysd;
    
    f = f1 + f2 + f3 + ymean;

    // log_liks for loo
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | f[n], sigma);
  }
}

Is it possible to do this with brms? I know @paul.buerkner has recently added more kernels to gp(), including some Materns

Using HSGPs, for the basis function coefficients, have you considered modeling those as functions with hyper-priors?

brms does not yet implement HSGPs for periodic kernels but there is already an open issue for it: Add periodic kernel for HSGP · Issue #1691 · paul-buerkner/brms · GitHub

1 Like

I haven’t - do you mean modelling e.g. beta_f3 as a function of AID?

Good to know, thanks! If we omit the periodic kernel from the model - is it possible to use brms to build a model with a shared squared exponential kernel, and then an individual-level Matérn 1/2 kernel?

Why don’t you try both, your model and BRMS?

The diagSPD_Matern12 function:

/**
  * Spectral density for 1/2 Matern (Ornstein-Uhlenbeck) kernel
  *
  * @param alpha Scaling parameter
  * @param rho Length scale parameter
  * @param L Length of the interval
  * @param M Number of basis functions
  * @return A vector of spectral densities
  */
vector diagSPD_Matern12(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real factor = 2;
  vector[M] denom = rho * ((1 / rho)^2 + (pi() / 2 / L) * indices);
  return alpha * sqrt(factor * inv(denom));
}

Change

  vector[M_f3] diagSPD_f3 = diagSPD_Matern12(sigma_f3, lengthscale_f3, L_f3, M_f3);

to matrix

  matrix[M_f3, N_animals] diagSPD_f3;
  matrix[M_f3, N_animals] beta_f3_raw;

Then use a loop to fill the matrix beta_f3:

for(i in 1:N_animals)
  beta_f3[, i] = beta_f3_raw[, i] .* diagSPD_Matern12(sigma_f3[i], lengthscale_f3[i], L_f3, M_f3);
vector[N] mu = 
columns_dot_product(PHI_f3'
  ,   beta_f3[, AID])
+ 
...
+ intercept;
y ~ normal(mu, sigma);
2 Likes

If you assume that a prior smoothness of the ID specific functions is the same over IDs, then you can use the same length scale for all IDs, which then means that you can use the same basis functions and spectral densities for all IDs, and you only need ID specific coefficients. You might assume that the average function is smoother and you could have separate length scale and magnitude for the average, and then common length scale and magnitude for how the individuals are different from the average. Although you have a hierarchical model also for the length scales, it is likely not to help as the length scales are weakly identified from the data, and you would need very different behavior for different individuals to see the effect. Remember that the length scale parameter is part of the prior specification, and the posterior wiggliness of each individual GP depends on the coefficient values (in case HSGP).

This doesn’t matter. You are doing basis function regression and different individuals can have different covariate values.

1 Like

Thanks for the code, this looks great!

This is great to know, thanks!