Hi, I’m working on a spatiotemporal dataset. The goal is predictions in the time dimension for each location. The baseline model is a hierarchical model, where each location has its own linear regression y_{st} \sim N( \beta_{0s} + \beta_{1s} t, \sigma_s^2) with group-level priors. Linear trend and normality assumption pass the tests. However, the residuals and parameters from the baseline model have clear and strong isotropic spatial correlations, so I think that using Gaussian processes with SE kernel to model the spatial correlations might improve the model.
The first thing I tried is to use independent GPs of same scales to measure the errors in each time period since the time dimension is relatively low frequency without notable autocorrelation. The assumption is the spatial shocks are of similar scales in each time period.
model {
\\ location level trends and intercepts, univariate independent prior,
\\ non-centered parameterization
vector[N_s] beta0 = mu_b0 + eta_b0 * sigma_b0;
vector[N_s] beta1 = mu_b1 + eta_b1 * sigma_b1;
\\ independent GP but same kernel for each time period, latent GP
matrix[N_s, N_t] eta ~ std_normal();
matrix[N_s, N_s] L_K = L_cov_exp_quad(gps, alpha, rho);
matrix[N_s, N_t] f = L_K * eta;
// Likelihood
target += normal_lpdf(y | to_vector(rep_matrix(beta0, N_t)) +
to_vector(rep_matrix(beta1, N_t)) .* time +
to_vector(f), sigma[loc]);}
The posterior samples have high autocorrelations within chains and the out of sample predictions for each location perform worse than the baseline model. It seems that the independent GPs in each time period are competing with the trends. The residuals (including the GPs) in each location do not sum to zero across time. Should I look into different specifications or other types of prior to put on eta? Hyperparameters in the GP kernel seem reasonable and spatial interpolation improves significantly though. Many thanks!