Hi all,
I’m using a Gaussian process model to capture spatial autocorrelation between countries in a panel data setting. I have 58 countries, with multiple observations for each country. How spatial autocorrelation between countries changes with distance is the primary quantity of interest, and I model this with a Gaussian process term for country varying intercepts.
For theoretical reasons, I’d like to use a covariance function that allows negative correlations between countries as distance increases. Radial basis functions like the exponentiated quadratic in brms
are restricted to zero or positive correlations. The best approximation for this seems to be a dot-product covariance through gp_dot_prod_cov
, but I’m open to alternatives.
The following model with dot-product covariance fits fine, but the estimated covariance and correlations, drawn from the generated quantities block below, aren’t sensible. I get tons of NaNs when calculating correlations from the posterior median of the covariance matrix. I’m not sure if the issue is in the model, generated quantities block, or how I’m using the generated quantities.
Building on McElreath’s Gaussian process material and @avehtari’s motorcycle GP case study here’s the model so far:
data {
int<lower = 1> N; // total number of observations
vector[N] Y; // response variable
int<lower = 1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower = 1> nstate;
vector[2] Dmat[nstate];
// indices of latent GP groups per observation
int<lower=1> Jgp_1[N];
}
transformed data{
real sigma_intercept = .1; // noise to jitter
}
parameters {
real intercept;
real<lower = 0> sigma;
real alpha;
vector[K] beta; // regression pars
real<lower = 0> sigma_f; // homogeneity of f
real<lower = 0> sigman; // noise sigma
vector[nstate] alpha_loc;
}
transformed parameters {
// covariances and Cholesky decompositions
matrix[nstate, nstate] K_f = gp_dot_prod_cov(Dmat, sigma_f) +
sigma_intercept;
matrix[nstate, nstate] L_f = cholesky_decompose(add_diag(K_f, sigman));
// non-centered intercepts
vector[nstate] k;
k = 0 + L_f * alpha_loc;
}
model{
// priors
intercept ~ student_t(3, 0, 2);
sigma_f ~ normal(0, 1);
sigman ~ normal(0, 1);
sigma ~ normal(0, 2);
alpha ~ normal(0, 4);
alpha_loc ~ normal(0, 1);
// centered intercepts: way slower
//k ~ multi_normal_cholesky(rep_vector(0, nstate) , K_f);
Y ~ skew_normal(intercept + X*beta + k[Jgp_1], alpha, sigma);
}
generated quantities{
matrix[nstate, nstate] gp_cov = gp_dot_prod_cov(Dmat, sigma_f) +
sigma_intercept;
}
Can write the model, which includes a linear regression with intercept \beta_1, temporal lag y_{i t-1} and regressors X_{it} \beta as:
Last, this is the R code I’m using to fit the model and summarize the Gaussian process covariance parameters. Data is also attached.
gp-data.Rdata (44.2 KB)
# compile model code
mn.dp.model <- cmdstan_model("data/gp-dotprod.stan")
# stan model fit
fit.mn.dp <- mn.dp.model$sample(
data = mn.data.dp,
seed = 12,
iter_warmup = 1000,
iter_sampling = 1000,
chains = 4,
parallel_chains = 4,
refresh = 200,
adapt_delta = .995,
max_treedepth = 20
)
# pull covariance matrix
dp.cov <- fit.mn.dp$draws(format = "df", variables = c("gp_cov"))
# posterior median
dp.cov.med <- apply(dp.cov, 2, median)
sqrt(ncol(dp.cov))
# fill matrix with posterior medians
dp.cov.mat <- matrix(nrow = round(sqrt(ncol(dp.cov))),
ncol = round(sqrt(ncol(dp.cov))))
for(i in 1:nrow(dp.cov.mat)){
for(j in 1:ncol(dp.cov.mat)){
dp.cov.mat[i, j] <- dp.cov.med[i*j]
}
diag(dp.cov.mat) <- diag(dp.cov.mat) + 0.01
}
dp.cor.mat <- cov2cor(dp.cov.mat)
Thanks in advance for any and all help!