# Hierarchical spatial GP with uneven sampling coordinates

Hi all, for starters I’m an ecologist who happens to use Stan, so what might be obvious to a statistician might not be for me. Also, this is my first real question here…

I’m working with some forest data and we’re interested in estimating the spatial autocorrelation of samples within the plots. Some of the samples will be collected at the same coordinates in each plot, but other samples will be surveyed in the whole plot and have unique coordinates.

I simulated count data and assumed that it would be collected at fixed coordinates. I used Hierarchical Gaussian Processes in Stan (mc-stan.org) as a guide and it worked beautifully. See below for the model.

``````data{
int<lower=1> N; //total number of observations
int<lower=1> L; // number of sampling locations in each plot
int<lower=1> P; //number of plots
int<lower=1, upper=P> plotID[N];
int<lower=1, upper=L> locID[N];
int<lower=0> y[N]; //response variable
vector[2] coords[L]; // coordinates
}
parameters{
real a0;
//hyperparameters for ap
real<lower=0> sigma_ap;
vector[P] z_ap;
//GP parameters
matrix[L,P] zGP;
real<lower=0> eta;
real<lower=0> rho;
}
transformed parameters{
vector[N] logmu;
vector[P] ap;
matrix[L,L] KL;
matrix[L,P] GP;

// Define the GP matrix
{
matrix[L,L] K;
KL = cholesky_decompose(K);
}
GP = KL * zGP;

//plot-level varying intercept, non-centered
ap = sigma_ap*z_ap;

//main model
for(i in 1:N){
logmu[i] = a0 + ap[plotID[i]] + GP[locID[i], plotID[i]];
}
}
model{
//priors
a0 ~ normal(1.5, 1);
sigma_ap~ normal(0, 1);
z_ap~ normal(0, 1);
to_vector(zGP) ~ normal(0,1);
eta ~ normal(0, 1);
rho ~ normal(0, 2);
//likelihood
y ~ poisson_log(logmu);
}

``````

My question is, how would I write the model so that I could fit data that aren’t sampled in the same location every time? Also, for the data that are sampled at fixed coordinates, how can I accommodate values that are NA?

1 Like

I think the answer to the second question also addresses the first, and the model is already set up to accommodate both. That is, you can pass a single `coords` data variable with all unique locations, in which case `GP` gets computed as a latent quantity that contains the model predictions at every possible location for each level of `plotID`, and then only actually-observed combinations of `plotID` and `locID` are drawn into the likelihood by way of the indexing into GP in this bit:

``````  //main model
for(i in 1:N){
logmu[i] = a0 + ap[plotID[i]] + GP[locID[i], plotID[i]];
}
``````

You’re right in that the NAs don’t pose any problems, which was a big relief.

But for data that aren’t sampled in the same location every time, I’d run into memory issues if I used this model structure to record all unique locations. My solution is to use a NNGP model.

1 Like