# Validating hierarchical GP with SBC

Hi all, I’m trying to validate hierarchical Gaussian process regressions using the SBC package. My real-world dataset is vegetation data, which was collected at many locations within many plots, and I’m interested in estimating plot-level means and local spatial structure. The basic format of the model is written below:

\begin{aligned} y_{x,p} &\sim \text{NegBinomial}(\mu_{x,p}, \theta) \\ \text{log}(\mu_{x,p}) &= \alpha_0 + \alpha_p + GP_{x,p} \\ \alpha_p &\sim \text{Normal}(0, \sigma)\\ GP_{x,p} &\sim \text{MultiNormal}(0, K_{\eta, \rho}(x)) \\ k(x_i, x_j|\eta, \rho) &= \eta^2 \text{exp}(-\frac{(x_i - x_j)^2}{2\rho^2}) \end{aligned}

where y_xp are the counts of data at location x in plot p.

To validate this model, I generated fake data in R below:


NB_apGP <- function(P, coords){

# kernel function
fkernel <- function(d, eta, rho) eta^2 * exp(-(d^2) / (2 * rho^2))

# data
L <- nrow(coords)
plotID <- rep(1:P, each = L)
locID <- rep(1:L, times = P)
N <- length(plotID)

# priors
a0 <- rnorm(1, 0, 1)
sigma_ap <- abs(rnorm(1, 0, 1))
theta <- rlnorm(1, log(.5), .7)
eta <- rgamma(1, 5, 20)
rho <- pscl::rigamma(1, 2.4, 7.3)

# deal with the GP
dmat <- fields::rdist(coords)
K <- fkernel(dmat, eta, rho)
# cholesky decomposition
zGP <- matrix(rnorm(L * P), nrow = L, ncol = P)
diag(K) <- diag(K) + 1e-9
GP <- t(chol(K)) %*% zGP

# mean model
ap <-  rnorm(P, 0, sigma_ap)
mu <- rep(NA, N)
for(i in 1:N){
mu[i] <- exp(a0 + ap[plotID[i]] + GP[locID[i], plotID[i]])
}
y <- rethinking::rgampois2(N, mu, theta)

# outputs
list(
variables = list(
a0 = a0,
sigma_ap = sigma_ap,
theta = theta,
eta = eta,
rho = rho
),
generated = list(
N = N,
P = P,
L = L,
plotID = plotID,
locID = locID,
y = y,
coords = coords
)
)
}


with the accompanied Stan model and SBC script attached.
NB_apGP.stan (1.1 KB)
SBC_NegBinom_forDiscourse.R (1.7 KB)

I’ve only run 10 simulations, but it’s clear that the model is having a hard time estimating eta and rho (*kappa is supposed to be theta in the figure). The Rhats are often >1.01 for rho too.

Is it just a matter of altering my priors, is there something fundamentally wrong with my model structure, or is there something else that I can’t think of? I’ve tried a poisson likelihood too, thinking that the model would do better if it had one less dispersion parameter, but it also has a hard time estimating rho.

Input very appreciated!

1 Like

Sorry, won’t have time for a deeper dive before new year, but there is a simple test you can do immediately: replace the rho parameter with a fixed value in both model and the simulator. If the resulting model behaves nicely, it is highly likely that the problem concerns rho. If not, then you can investigate the reduced model and see if fixing any issues there will then result in resolving the problems in the bigger model.

Best of luck with your model!

Thanks for the suggestion, and yes, holiday time is not the best time to get help. So, thanks for piping in. Maybe someone else will see this in the meantime.

I hard coded Rho = 4 and it certainly improves it, but it still doesn’t sample perfectly. Here’s the summary output:

> summary(results_NB_apGP)
SBC_results with 20 total fits.
- No fits gave warnings.
- 4 (20%) fits had at least one Rhat > 1.01. Largest Rhat was 1.015.
- All fits had tail ESS > half of the maximum rank.
- The lowest bulk ESS was 370
- No fits had failed chains.
- No fits had divergent transitions.
- No fits had iterations that saturated max treedepth.
- 20 (100%) fits had some steps rejected. Maximum number of rejections was 18.
- Maximum time per chain was 41.718 sec.


and the model still isn’t learning the true value of eta:

The prior for eta is gamma(5,20) since I thought the model had troubles when it got too close to 0. I did try changing it to normal(0,1) bounded at 0, but it led to a handful of simulations with divergent transitions.

Other directions

1. Are there reasonable ways to constrain eta and rho? I need most of the mass of the prior for rho to ideally range between 1-30 (e.g. invgamma(2.4, 7.3)), but my experiences showed that didn’t work.

2. I’ve noticed other people include 2 GPs–one for the main effect of x and the offset for each group (in this case a plot-level GP). ex/ here, @mike-lawrence’s model here, and as suggested by @bbbales2 here. This also has another advantage for my study question since I’m ultimately interested in how the spatial structure varies by another category and I could stick that into the main effect of x.

• if I decompose GP into GP_mu + GP_p, like below, I think it’s reasonable to use a Poisson likelihood since the GP_p acts as dispersion parameter.
• I did run this, but it takes ~3x as long. When I ran 10 simulations, 2 had saturated treedepth. I also fixed the rhos that go into both GPs, so I can only imagine it’ll also introduce more issues when I actually model the the rhos.
  vector[N] logmu;
vector[P] ap;
vector[L] GP_mu;
matrix[L,P] GP_p;

// Define the GP matrices
{
matrix[L,L] KL_mu;
matrix[L,L] K_mu = gp_exp_quad_cov(coords, eta_mu, rho_mu);
matrix[L,L] KL_p;
matrix[L,L] K_p = gp_exp_quad_cov(coords, eta_p, rho_p);