Hi all, for starters I’m an ecologist who happens to use Stan, so what might be obvious to a statistician might not be for me. Also, this is my first real question here…
I’m working with some forest data and we’re interested in estimating the spatial autocorrelation of samples within the plots. Some of the samples will be collected at the same coordinates in each plot, but other samples will be surveyed in the whole plot and have unique coordinates.
I simulated count data and assumed that it would be collected at fixed coordinates. I used Hierarchical Gaussian Processes in Stan (mc-stan.org) as a guide and it worked beautifully. See below for the model.
data{
int<lower=1> N; //total number of observations
int<lower=1> L; // number of sampling locations in each plot
int<lower=1> P; //number of plots
int<lower=1, upper=P> plotID[N];
int<lower=1, upper=L> locID[N];
int<lower=0> y[N]; //response variable
vector[2] coords[L]; // coordinates
}
parameters{
real a0;
//hyperparameters for ap
real<lower=0> sigma_ap;
vector[P] z_ap;
//GP parameters
matrix[L,P] zGP;
real<lower=0> eta;
real<lower=0> rho;
}
transformed parameters{
vector[N] logmu;
vector[P] ap;
matrix[L,L] KL;
matrix[L,P] GP;
// Define the GP matrix
{
matrix[L,L] K;
K = cov_exp_quad(coords, eta, rho);
K = add_diag(K, 1e-9);
KL = cholesky_decompose(K);
}
GP = KL * zGP;
//plot-level varying intercept, non-centered
ap = sigma_ap*z_ap;
//main model
for(i in 1:N){
logmu[i] = a0 + ap[plotID[i]] + GP[locID[i], plotID[i]];
}
}
model{
//priors
a0 ~ normal(1.5, 1);
sigma_ap~ normal(0, 1);
z_ap~ normal(0, 1);
to_vector(zGP) ~ normal(0,1);
eta ~ normal(0, 1);
rho ~ normal(0, 2);
//likelihood
y ~ poisson_log(logmu);
}
My question is, how would I write the model so that I could fit data that aren’t sampled in the same location every time? Also, for the data that are sampled at fixed coordinates, how can I accommodate values that are NA?
Thanks in advance!