Hierarchical spatial GP with uneven sampling coordinates

Lisar · December 20, 2021, 7:35pm

Hi all, for starters I’m an ecologist who happens to use Stan, so what might be obvious to a statistician might not be for me. Also, this is my first real question here…

I’m working with some forest data and we’re interested in estimating the spatial autocorrelation of samples within the plots. Some of the samples will be collected at the same coordinates in each plot, but other samples will be surveyed in the whole plot and have unique coordinates.

I simulated count data and assumed that it would be collected at fixed coordinates. I used Hierarchical Gaussian Processes in Stan (mc-stan.org) as a guide and it worked beautifully. See below for the model.

data{
  int<lower=1> N; //total number of observations
  int<lower=1> L; // number of sampling locations in each plot
  int<lower=1> P; //number of plots
  int<lower=1, upper=P> plotID[N];
  int<lower=1, upper=L> locID[N];
  int<lower=0> y[N]; //response variable 
  vector[2] coords[L]; // coordinates
}
parameters{
  real a0;
  //hyperparameters for ap
  real<lower=0> sigma_ap;
  vector[P] z_ap;
  //GP parameters
  matrix[L,P] zGP; 
  real<lower=0> eta;
  real<lower=0> rho;
}
transformed parameters{ 
  vector[N] logmu; 
  vector[P] ap;
  matrix[L,L] KL;
  matrix[L,P] GP;
  
  // Define the GP matrix
  {
    matrix[L,L] K;
    K = cov_exp_quad(coords, eta, rho);
    K =  add_diag(K, 1e-9);
    KL = cholesky_decompose(K);
  }
  GP = KL * zGP;

  //plot-level varying intercept, non-centered
  ap = sigma_ap*z_ap;

  //main model
  for(i in 1:N){
    logmu[i] = a0 + ap[plotID[i]] + GP[locID[i], plotID[i]]; 
    }
}
model{
  //priors
  a0 ~ normal(1.5, 1);
  sigma_ap~ normal(0, 1);
  z_ap~ normal(0, 1);
  to_vector(zGP) ~ normal(0,1);
  eta ~ normal(0, 1);
  rho ~ normal(0, 2);
  //likelihood
  y ~ poisson_log(logmu);
}

My question is, how would I write the model so that I could fit data that aren’t sampled in the same location every time? Also, for the data that are sampled at fixed coordinates, how can I accommodate values that are NA?

Thanks in advance!

mike-lawrence · February 8, 2022, 1:13pm

I think the answer to the second question also addresses the first, and the model is already set up to accommodate both. That is, you can pass a single coords data variable with all unique locations, in which case GP gets computed as a latent quantity that contains the model predictions at every possible location for each level of plotID, and then only actually-observed combinations of plotID and locID are drawn into the likelihood by way of the indexing into GP in this bit:

  //main model
  for(i in 1:N){
    logmu[i] = a0 + ap[plotID[i]] + GP[locID[i], plotID[i]]; 
    }

Lisar · February 8, 2022, 9:46pm

You’re right in that the NAs don’t pose any problems, which was a big relief.

But for data that aren’t sampled in the same location every time, I’d run into memory issues if I used this model structure to record all unique locations. My solution is to use a NNGP model.

Topic		Replies	Views
Modeling time dynamics with GP Modeling specification , gaussian-process	12	644	April 29, 2020
Time series and Gaussian process Modeling gaussian-process	62	4471	December 20, 2019
Generate replicated data for out-of-sample data for gaussian process Modeling gaussian-process	4	757	July 29, 2020
Model for point process in Stan Modeling	3	99	August 16, 2024
Using 2D Gaussian process predictions within model Modeling gaussian-process	11	2332	February 18, 2022

Hierarchical spatial GP with uneven sampling coordinates

Related topics