Weak identification/overparameterization in spatial GPs with time trends

Hi, I’m working on a spatiotemporal dataset. The goal is predictions in the time dimension for each location. The baseline model is a hierarchical model, where each location has its own linear regression y_{st} \sim N( \beta_{0s} + \beta_{1s} t, \sigma_s^2) with group-level priors. Linear trend and normality assumption pass the tests. However, the residuals and parameters from the baseline model have clear and strong isotropic spatial correlations, so I think that using Gaussian processes with SE kernel to model the spatial correlations might improve the model.

The first thing I tried is to use independent GPs of same scales to measure the errors in each time period since the time dimension is relatively low frequency without notable autocorrelation. The assumption is the spatial shocks are of similar scales in each time period.

model {
  \\ location level trends and intercepts, univariate independent prior, 
  \\ non-centered parameterization
  vector[N_s] beta0 = mu_b0 + eta_b0 * sigma_b0;
  vector[N_s] beta1 = mu_b1 + eta_b1 * sigma_b1; 

  \\ independent GP but same kernel for each time period, latent GP
  matrix[N_s, N_t] eta ~ std_normal();
  matrix[N_s, N_s] L_K =  L_cov_exp_quad(gps, alpha, rho);
  matrix[N_s, N_t] f =  L_K * eta;

  // Likelihood 
  target += normal_lpdf(y | to_vector(rep_matrix(beta0, N_t)) + 
         to_vector(rep_matrix(beta1, N_t)) .* time + 
         to_vector(f), sigma[loc]);} 

The posterior samples have high autocorrelations within chains and the out of sample predictions for each location perform worse than the baseline model. It seems that the independent GPs in each time period are competing with the trends. The residuals (including the GPs) in each location do not sum to zero across time. Should I look into different specifications or other types of prior to put on eta? Hyperparameters in the GP kernel seem reasonable and spatial interpolation improves significantly though. Many thanks!

I’m a little fuzzy on the details of the implementation, but since there are no replies so far so I’m going to make a general comment with respect to linear time trends within GP models.

If you model the mean of the gaussian process as a linear model that will indeed compete with with the GP itself, effectively it is detrending the data, and whatever is left is what will be modeled by the GP.
Unless it’s really something you don’t want the GP to capture, I’d actually not use the linear model at all (model mean GP as constant and let the signal variance account for the trend around a that historical constant mean). But if your main interest is the linear trend maybe the GP is will indeed cause overfitting because it is large, complex and very flexible.

I’m not sure that is exactly what you asked, but I’d probably have to look more closely into more of the model.

Thank you for your reply! Sorry for the confusion. The GPs are imposed on the space dimension while the linear trends are imposed on the time dimension, so they are not directly competing. I was thinking about using a separable spatio-temporal GP with a linear kernel GP to the time dimension and a SE kernel GP to the space dimension, but couldn’t work around the implicit spatial stationarity assumption.