Modelling phylogenetic and spatial autocorrelations as Gaussian processes with multiple observations per species and quadrat

I am building a phylogenetic model based on multiple observations per species. In such cases, standard approach stemming from quantitative genetics is to fit a multi-level model with separate varying-intercept (‘random’) effects for phylogeny and species identity estimating phylogenetic and non-heritable species-specific components, respectively (e.g. chapter 7 in Garamszegi’s Modern Phylogenetic Comparative Methods, 2014). In brms notation, the model would be:

y_var ~ x_var + (1 | gr(species_id)) + (1 | gr(phylo), cov = A_mat)

where A_mat is is phylogenetic variance-covariance matrix based on Brownian motion model and species_id and phylo are vectors, both containing species names.

Instead of Brownian motion or Pagel’s lambda model, I would like to fit Ornstein-Uhlenbeck model as a Gaussian process with a following kernel function:

K_{i,j} = \eta^2 exp(-\frac{D_{i,j}}{\rho}) + \delta_{i,j} \sigma_{k}^2

where D_{i,j} is patristic (phylogenetic) distance between species i and species j provided in a separate matrix. This is not possible in brms, so I am trying to fit the model directly in Stan.

I have several questions:

  1. Is it a good idea to fit species_id as a varying-intercept effect alongside with the Gaussian OU process for phylogeny similarly to the first model above? My attempt in Stan looks as follows:
  matrix cov_GP_OU(matrix x, real eta, real rho, real sigma_k) {
    int N = dims(x)[1];
    matrix[N, N] K;
    for (i in 1:(N - 1)) {
      K[i, i] = square( eta) + square( sigma_k );
      for (j in (i + 1):N) {
        K[i, j] = square( eta ) * exp( -x[i, j] / rho );
        K[j, i] = K[i,j];
    K[N, N] = square( eta) + square( sigma_k );
    return K;

data {
  int<lower=1> N;
  int<lower=1> N_spp;
  vector[N] y_var;
  vector[N] x_var;
  int<lower=1> spec_id[N];
  matrix[N_spp, N_spp] Amat;

parameters {
  vector[N_spp] z_S;
  vector[N_spp] z_a_spec;
  real a_bar;
  real b1;
  real<lower=0> sigma_a_spec;
  real<lower=0> eta;
  real<lower=0> rho;
  real<lower=0> sigma_k;
  real<lower=0> res_sigma;

model {
  matrix[N_spp,N_spp] SIGMA = cov_GP_OU( Amat , eta , rho , sigma_k );
  matrix[N_spp,N_spp] L_SIGMA = cholesky_decompose( SIGMA );
  vector[N_spp] a_phylo = L_SIGMA * z_S;
  vector[N] mu;
  res_sigma ~ exponential( 1 );
  rho ~ inv_gamma(4.6, 22.1);
  eta ~ normal( 1 , 1 );
  sigma_k ~ exponential( 1 );
  sigma_a_spec ~ exponential( 1 );
  b1 ~ normal( 0 , 2 );
  a_bar ~ normal( 12 , 5 );
  z_a_spec ~ normal( 0 , 1 );
  z_S ~ normal( 0 , 1 );
  for ( i in 1:N ) {
    mu[i] = a_bar + z_a_spec[spec_id[i]] * sigma_a_spec + a_phylo[spec_id[i]] + b1 * x_var[i];
  y_var ~ normal( mu , res_sigma );
  1. Is it possible to partition variance explained by Gaussian process and by species_id? Which parameter gives the variance explained by GP? Is it \sigma_{k}^2? Or is it \eta^2 + \sigma_{k}^2, which gives the values in the diagonal of the K variance-covariance matrix (i.e. where phylogenetic distance equals zero)? What is actually the interpretation of \eta in the diagonal of the K matrix?

  2. The same questions relate to the extended model, in which I plan to add another GP with spatial distance matrix and quadratic exponential kernel. The distances are calculated between the centres of quadrats of a spatial grid. Could grid_id be added alongside with this GP, similarly to the simultaneous fit of species_id and the phylogenetic OU GP presented above?

Any help or comments would be greatly appreciated.

1 Like

I just realized that my \sigma_{k}^2 (sigma_k in my Stan code) is the within-species variance and, hence, probably the same as residual variance (res_sigma in my Stan code). I added the term \delta_{i,j} \sigma_{k}^2 (where \delta_{i,j} is Kronecker delta) to the covariance function based on McElreath’s Statistical Rethinking (Example 14.5.2 in the second edition) and @betanalpha post on GP modeling (Robust Gaussian Process Modeling). However, McElreath’s phylogenetic model is specified as a multivariate response model Y \tilde{} MVNormal(\mu , S) (where S is a covariance matrix) and hence he needed to include residual variance in the diagonal of S. Unlike McElreath’s model, my model is specified as a univariate model Y \tilde{} Normal(\mu , \sigma_{res}) and GP is included within the linear predictor \mu. Hence, I guess the inclusion of the term \sigma_{k}^2 in my covariance function for the OU process is redundant and should be removed. Is that correct?

In this formulation you may still need to add a small fixed term often called jitter to the diagonal of the covariance matrix to avoid computational issues. As you have GP prior and normal observation model you could also combine them and also use the multivariate normal directly which would likely improve the inference.

1 Like

Thank you very much @avehtari for your suggestions. You are right that when I remove \sigma_{k} from the covariance function, I need to add a small jitter to \eta^2 in the diagonal. So is my understanding correct that I should not include \sigma_{k} in my formulation because it is duplicated residual variance?

Do you mean that I can combine the phylogenetic Ornstein-Uhlenbeck and spatial exponentiated quadratic process into a single covariance function? How would that work if there are different numbers of species and sites (quadrats) and one species can be sampled in multiple sites. My current understanding is that I need two covariance matrices: N_{species} \times N_{species} matrix for phylogeny and N_{sites} \times N_{sites} matrix for spatial autocorrelation. That was the reason why I planned to include both GPs within the linear predictor. Or is there a possibility to combine both into a single covariance matrix and fit a multivariate normal model?

As for the improved inference of the multivariate normal model. Let’s have the normal model:

Y \tilde{} Normal(\mu, \sigma)
\mu = \beta_{0} + \beta_{1} x + k_{species}
k_{species} \tilde{} MVNormal(0, K)

and the multivariate normal model:

Y \tilde{} MVN(\mu, K)
\mu = \beta_{0} + \beta_{1} x

where K is the covariance matrix of the Ornstein-Uhlenbeck process. Does the improved inference of the multivariate model mean that these models are not equivalent?

1 Like

It’s ok to have a small fixed jitter \eta and unknown residual scale \sigma, and the total unstructured variance is the sum of squares of those. If \eta \ll \sigma then there is not much difference to \eta=0 except the improved numerical stability.

Yes. If f_1 \sim \mathrm{GP}(\mu_1, K_1) and f_2 \sim \mathrm{GP}(\mu_2, K_2), then f_1 + f_2 \sim \mathrm{GP}(\mu_1 + \mu_2, K_1 + K_2).

You just need compute K_1 and K_2 for all observations.

The downside of combining the covariances is that the combined covariance matrix can be big, making the (basic) multivariate normal density evaluation slow.

The downside of not combining them is strong dependencies in the posterior, which can make the sampling slow.

1 Like

It may help to review Section 2.3 and Section 2.4 of my case study that you references. In Section 2.3 I build Stan programs that separate out the latent functional behavior being modeled by the gaussian process and the observational model that adds additional variation (both a normal observational model and a Poisson observational model). In Section 2.4 I then consider the analytic results available in the case of the normal observational model where the latent gaussian process can be integrated out to give a “posterior” gaussian process.

1 Like

Following the discussion with @avehtari and reading the @betanalpha 's case study, I got the model running and found the answer to my question about combining GPs with varying intercept for species/site ID, which I am posting here. The Gaussian processes already include variance due to species identity so there is NO need to fit species identity along with the Gaussian process as opposed to standard approach, where phylogeny and varying intercept for species ID are fitted together in a model (e.g. chapter 7 in Garamszegi’s Modern Phylogenetic Comparative Methods, 2014). The same applies for the spatial GP and site ID. In the GP, the variance explained by species/site ID is expressed by the squared marginal deviation parametr (\eta in my case, or \alpha in @betanalpha 's case study referenced above).

As proposed by @avehtari , I combined both GPs in a single covariance matrix, which works well, even without the need for a diagonal jitter. Following Aki’s recommendation, I even included a varying intercept for a person, who collected the observation (collector ID) within the covariance matrix, which should be more efficient. The resulting model looks as follows:

  // Ornstein-Uhlenbeck + exponentiated quadratic covariance function
  matrix cov_GP( matrix P, matrix S, matrix C, real eta_p, real rho_p, real eta_s, real rho_s, real sigma_c, real sigma_n ) {
    real sq_eta_p = square( eta_p );
    real sq_eta_s = square( eta_s );
    real sq_sigma_c = square( sigma_c );
    real sq_sigma_n = square( sigma_n );
    int N = dims( P )[1];
    matrix[N, N] K;
    for (i in 1:(N - 1)) {
      K[i, i] = sq_eta_p + sq_eta_s + sq_sigma_c + sq_sigma_n;
      for (j in (i + 1):N) {
        K[i, j] = sq_eta_p * exp( -P[i, j] / rho_p ) + // phylogenetic Ornstein-Uhlenbeck covariance function
                  sq_eta_s * exp( -0.5 * square( S[i, j] / rho_s ) ) + // spatial exponentiated quadratic covariance function
                  sq_sigma_c * C[i, j]; // collector ID varying intercept, C[i, j] is 1 when both observations were collected by the same person, 0 otherwise
        K[j, i] = K[i, j];
    K[N, N] = sq_eta_p + sq_eta_s + sq_sigma_c + sq_sigma_n;
    return K;

data {
  int<lower=1> N;
  vector[N] y_var;
  vector[N] x_var;
  matrix[N, N] P_mat;
  matrix[N, N] S_mat;
  matrix[N, N] C_mat;

parameters {
  real a;
  real b;
  real<lower=0> eta_p;
  real<lower=0> rho_p;
  real<lower=0> eta_s;
  real<lower=0> rho_s;
  real<lower=0> sigma_c;
  real<lower=0> sigma_n;

model {
  matrix[N, N] K = cov_GP( P_mat, S_mat, C_mat, eta_p, rho_p, eta_s, rho_s, sigma_c, sigma_n );
  vector[N] mu;
  a ~ normal( 0 , 1 );
  b ~ normal( 0 , 0.5 );
  eta_p ~ normal( 0 , 1 );
  rho_p ~ inv_gamma( 1.5 , 0.057 );
  eta_s ~ normal( 0 , 1 );
  rho_s ~ inv_gamma( 1.5 , 0.057 );
  sigma_c ~ normal( 0 , 1 );
  sigma_n ~ normal( 0 , 1 );
  mu =  a + b * x_var;
  y_var ~ multi_normal( mu , K );

The only problem is that sampling takes a lot of time with high number of observations (thousands), which is understandable. Another problem is the R memory limit for a single objects, which prevents running a model with very large number of observations (ca. 10K). But for smaller data sets this approach seems to work better than two separate latent GP. Thanks a lot for all the help!


Instead of looping, how about,

K = add_diag(sq_eta_p * exp(-P / rho_p) + 
                         sa_eta_s * exp( -0.5 * square(S / rho_s ))  +
                         sq_sigma_c * C, 
                       sq_eta_p + sq_eta_s + sq_sigma_c + sq_sigma_n);

At some point, when the varmat support improves, this should be much faster and more memory efficient (maybe in the release after the next release?).

1 Like

Thank you for this advise! I did not occur to me that add_diag can be used to create the whole matrix this way.

In the end, I fitted each GP as a latent process and it runs considerably faster than the combined observation-based matrix. And more importantly, I do not hit the R size limit for a single object, which was the problem with the combined matrix and 10K observations.

1 Like