Gaussian Process with Dot-Product Covariance

Hi all,

I’m using a Gaussian process model to capture spatial autocorrelation between countries in a panel data setting. I have 58 countries, with multiple observations for each country. How spatial autocorrelation between countries changes with distance is the primary quantity of interest, and I model this with a Gaussian process term for country varying intercepts.

For theoretical reasons, I’d like to use a covariance function that allows negative correlations between countries as distance increases. Radial basis functions like the exponentiated quadratic in brms are restricted to zero or positive correlations. The best approximation for this seems to be a dot-product covariance through gp_dot_prod_cov, but I’m open to alternatives.

The following model with dot-product covariance fits fine, but the estimated covariance and correlations, drawn from the generated quantities block below, aren’t sensible. I get tons of NaNs when calculating correlations from the posterior median of the covariance matrix. I’m not sure if the issue is in the model, generated quantities block, or how I’m using the generated quantities.

Building on McElreath’s Gaussian process material and @avehtari’s motorcycle GP case study here’s the model so far:

data {
  int<lower = 1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower = 1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower = 1> nstate; 
  vector[2] Dmat[nstate];
  // indices of latent GP groups per observation
  int<lower=1> Jgp_1[N];
}

transformed data{
  real sigma_intercept = .1; // noise to jitter
}

parameters {
    real intercept;
    real<lower = 0> sigma; 
    real alpha; 
    vector[K] beta; // regression pars
    real<lower = 0> sigma_f;       // homogeneity of f
    real<lower = 0> sigman;         // noise sigma
    vector[nstate] alpha_loc; 

}

transformed parameters {
  
    // covariances and Cholesky decompositions
  matrix[nstate, nstate] K_f = gp_dot_prod_cov(Dmat, sigma_f) +
                     sigma_intercept;
  matrix[nstate, nstate] L_f = cholesky_decompose(add_diag(K_f, sigman));

  // non-centered intercepts
  vector[nstate] k;
  k = 0 + L_f * alpha_loc;
  
}

model{
    
    // priors
    intercept ~ student_t(3, 0, 2);
    sigma_f ~ normal(0, 1); 
    sigman ~ normal(0, 1);


    sigma ~ normal(0, 2);
    alpha ~ normal(0, 4);
    alpha_loc ~ normal(0, 1);
    
    // centered intercepts: way slower
    //k ~ multi_normal_cholesky(rep_vector(0, nstate) , K_f);
    
    Y ~ skew_normal(intercept + X*beta + k[Jgp_1], alpha, sigma);
}

generated quantities{
  
    matrix[nstate, nstate] gp_cov = gp_dot_prod_cov(Dmat, sigma_f) +
                     sigma_intercept;
}

Can write the model, which includes a linear regression with intercept \beta_1, temporal lag y_{i t-1} and regressors X_{it} \beta as:

y_{it} \sim \mbox{skew-normal}(\mu_{it}, \alpha, \sigma) \\ \mu_{it} = \beta_1 + y_{i t-1} + X_{it} \beta + \lambda^{state}_{i} \\ \lambda^{state} \sim \mbox{MVNormal}([0, ... , 0], \mathbf{\Sigma}) \\ \mathbf{\Sigma} = \sigma_f + dist_{i} \cdot dist_{j}

Last, this is the R code I’m using to fit the model and summarize the Gaussian process covariance parameters. Data is also attached.

gp-data.Rdata (44.2 KB)

# compile model code
mn.dp.model <- cmdstan_model("data/gp-dotprod.stan")


# stan model fit
fit.mn.dp <- mn.dp.model$sample(
    data = mn.data.dp,
    seed = 12,
    iter_warmup = 1000,
    iter_sampling = 1000,
    chains = 4,
    parallel_chains = 4,
    refresh = 200,
    adapt_delta = .995,
    max_treedepth = 20
)


# pull covariance matrix 
dp.cov <- fit.mn.dp$draws(format = "df", variables = c("gp_cov"))

# posterior median
dp.cov.med <- apply(dp.cov, 2, median)
sqrt(ncol(dp.cov))

# fill matrix with posterior medians
dp.cov.mat <- matrix(nrow = round(sqrt(ncol(dp.cov))),
                     ncol = round(sqrt(ncol(dp.cov))))
for(i in 1:nrow(dp.cov.mat)){
  for(j in 1:ncol(dp.cov.mat)){
    dp.cov.mat[i, j] <- dp.cov.med[i*j]
  }
  diag(dp.cov.mat) <- diag(dp.cov.mat) + 0.01
}

dp.cor.mat <- cov2cor(dp.cov.mat)

Thanks in advance for any and all help!