Matching group IDs to corresponding rows and columns of distance or covariance matrices

oldrich_tomasek · July 24, 2021, 2:58pm

I am building a model, in which I use Gaussian processes to control for phylogenetic and spatial autocorrelation among species and sites, respectively. There are multiple observations per both species and site. Let say my data look something like this:

> dat
      y_var     x_var spec_id site_id
1  12.36641 12.712033       3       2
2  24.04981 10.194881       1       1
3  23.45062 13.711540       4       4
4  24.75315  9.677793       2       2
5  26.77541 13.200210       1       3
6  18.57922 11.704592       3       3
7  18.49895 13.516832       3       1
8  15.51658 10.905999       4       1
9  32.79624 13.046312       2       4
10 19.54373 12.702731       1       2

In addition, my data contain two matrices: P_mat is N_{species} \times N_{species} matrix with phylogenetic distances between species and D_mat is N_{sites} \times N_{sites} matrix with spatial distances between sampling sites (quadrats).

My question is how Stan matches the species in spec_id and the sites in site_id to the corresponding rows and columns in the respective distance matrices.

The relevant parts of my model currently look as follows:

matrix[N_spp,N_spp] SIGMA_phylo = cov_GP_phylo( P_mat , eta_phylo , rho_phylo );
matrix[N_spp,N_spp] L_SIGMA_phylo = cholesky_decompose( SIGMA_phylo );
vector[N_spp] f_phylo = L_SIGMA_phylo * z_phylo;

matrix[N_sites,N_sites] SIGMA_space = cov_GP_space( D_mat , eta_space , rho_space );
matrix[N_sites,N_sites] L_SIGMA_space = cholesky_decompose( SIGMA_space );
vector[N_sites] f_space = L_SIGMA_space * z_space;

mu = b0 + b1 * x_var + f_phylo[spec_id] + f_space[site_id]
y_var ~ normal( mu , sigma_n )

Will Stan match f_phylo and z_phylo to rows and columns in P_mat in ascending order (and the same for sites)? In other words, if I order P_mat so that species 1 is in the first row and and first column, species 2 is in the second row and second column and so on (and the same for D_mat), will everything match, irrespective of the order in the data set with individual observations shown above?

jsocolar · July 24, 2021, 3:22pm

What are the functions cov_GP_space and cov_GP_phylo?

oldrich_tomasek · July 24, 2021, 3:33pm

Sorry, the functions are Gaussian process covariance functions defined as follows:

functions{
  // Ornstein-Uhlenbeck covariance function
  matrix cov_GP_phylo(matrix x, real eta_ou, real rho_ou) {
    int N = dims(x)[1];
    matrix[N, N] K_OU;
    for (i in 1:(N - 1)) {
      K_OU[i, i] = square( eta_ou ) + 0.00001;
      for (j in (i + 1):N) {
        K_OU[i, j] = square( eta_ou ) * exp( -x[i, j] / rho_ou );
        K_OU[j, i] = K_OU[i,j];
      }
    }
    K_OU[N, N] = square( eta_ou ) + 0.00001;
    return K_OU;
  }
  // Exponentiated quadratic covariance function
  matrix cov_GP_space(matrix x, real eta_eq, real rho_eq) {
    int N = dims(x)[1];
    matrix[N, N] K_EQ;
    for (i in 1:(N - 1)) {
      K_EQ[i, i] = square( eta_eq ) + 0.00001;
      for (j in (i + 1):N) {
        K_EQ[i, j] = square( eta_eq ) * exp( -0.5 * square( x[i, j] / rho_eq ));
        K_EQ[j, i] = K_EQ[i,j];
      }
    }
    K_EQ[N, N] = square( eta_eq ) + 0.00001;
    return K_EQ;
  }
}

Topic		Replies	Views
Modelling phylogenetic and spatial autocorrelations as Gaussian processes with multiple observations per species and quadrat Modeling gaussian-process , spatial , phylogenetic	8	1231	September 9, 2021
Gaussian process and QR reparametrization Modeling specification , performance , gaussian-process	6	791	September 8, 2021
Hierarchical spatial GP with uneven sampling coordinates Modeling rstan , techniques , gaussian-process	2	1302	February 8, 2022
Symetric covariance function Modeling	2	410	March 5, 2020
Speedups for Wishart model Modeling specification , performance	24	4303	November 10, 2020

Matching group IDs to corresponding rows and columns of distance or covariance matrices

Related topics