Thanks for the suggestion, and yes, holiday time is not the best time to get help. So, thanks for piping in. Maybe someone else will see this in the meantime.
I hard coded Rho = 4 and it certainly improves it, but it still doesn’t sample perfectly. Here’s the summary output:
> summary(results_NB_apGP)
SBC_results with 20 total fits.
- No fits had errors.
- No fits gave warnings.
- 4 (20%) fits had at least one Rhat > 1.01. Largest Rhat was 1.015.
- All fits had tail ESS > half of the maximum rank.
- The lowest bulk ESS was 370
- No fits had failed chains.
- No fits had divergent transitions.
- No fits had iterations that saturated max treedepth.
- 20 (100%) fits had some steps rejected. Maximum number of rejections was 18.
- Maximum time per chain was 41.718 sec.
and the model still isn’t learning the true value of eta:
The prior for eta is gamma(5,20)
since I thought the model had troubles when it got too close to 0. I did try changing it to normal(0,1)
bounded at 0, but it led to a handful of simulations with divergent transitions.
Other directions
-
Are there reasonable ways to constrain eta and rho? I need most of the mass of the prior for rho to ideally range between 1-30 (e.g. invgamma(2.4, 7.3)), but my experiences showed that didn’t work.
-
I’ve noticed other people include 2 GPs–one for the main effect of x and the offset for each group (in this case a plot-level GP). ex/ here, @mike-lawrence’s model here, and as suggested by @bbbales2 here. This also has another advantage for my study question since I’m ultimately interested in how the spatial structure varies by another category and I could stick that into the main effect of x.
- if I decompose GP into GP_mu + GP_p, like below, I think it’s reasonable to use a Poisson likelihood since the GP_p acts as dispersion parameter.
- I did run this, but it takes ~3x as long. When I ran 10 simulations, 2 had saturated treedepth. I also fixed the rhos that go into both GPs, so I can only imagine it’ll also introduce more issues when I actually model the the rhos.
vector[N] logmu;
vector[P] ap;
vector[L] GP_mu;
matrix[L,P] GP_p;
// Define the GP matrices
{
matrix[L,L] KL_mu;
matrix[L,L] K_mu = gp_exp_quad_cov(coords, eta_mu, rho_mu);
matrix[L,L] KL_p;
matrix[L,L] K_p = gp_exp_quad_cov(coords, eta_p, rho_p);
KL_mu = cholesky_decompose(add_diag(K_mu, 1e-9));
GP_mu = KL_mu * zGP_mu;
KL_p = cholesky_decompose(add_diag(K_p, 1e-9));
GP_p = KL_p * zGP_p;
}
for(i in 1:N){
logmu[i] = a0 + ap[plotID[i]] + GP_mu[locID[i]] + GP_p[locID[i], plotID[i]];
}
y ~ poisson_log(logmu);