Hello! I’m looking for a bit of general advice or some examples to learn from rather than model triage (at this point). My question relates to best practices for the specification of multiple continuous predictors in a hierarchical Gaussian process. I have been working from the excellent 2017 StanCon hierarchical GP example from @rtrangucci. The vote-share model example has been extremely helpful in specifying hierarchical GPs, and I’ve successfully used this framework to specify GPs for tree count data with size class (bole diameter) as a continuous predictor across spatial and community hierarchies. A simple version of such a model is here, where count by diameter class is collected on each plot:

```
data {
int<lower=1> N;
int<lower=1> n_diam;
int<lower=1> n_plots;
int<lower=1> plot_idx[N];
int<lower=1> diam_idx[N];
int<lower=1> diam[N];
real<lower=0> exposure[N];
int<lower=0> y[N];
int<lower=0> prior_only;
}
transformed data {
real diams[n_diam];
for (d in 1:n_diam) {
diams[d] = d;
}
}
parameters {
real mu;
real<lower=0> inverse_phi;
vector[n_diam] GP_mu_std;
real<lower=0> alpha_mu;
real<lower=0> rho_mu;
matrix[n_diam, n_plots] GP_plot_std;
real<lower=0> alpha_plot;
real<lower=0> rho_plot;
vector[n_plots] plot_std;
real<lower=0> sigma_plot;
}
transformed parameters {
vector[n_diam] GP_mu;
vector[n_plots] plot_re; // plot random effect
matrix[n_diam, n_plots] GP_plot; // plot GP
real eta[N]; // log lambda for likelihood
real phi;
plot_re = sigma_plot * plot_std;
phi = 1 / inverse_phi;
// Plot-level non-centered Gaussian Process (mvnorm std * L_cov)
{
matrix[n_diam, n_diam] cov_mu;
matrix[n_diam, n_diam] L_cov_mu;
matrix[n_diam, n_diam] cov_plot;
matrix[n_diam, n_diam] L_cov_plot;
cov_mu = cov_exp_quad(diams, alpha_mu, rho_mu);
cov_plot = cov_exp_quad(diams, alpha_plot, rho_plot);
for (d in 1:n_diam) {
cov_mu[d, d] = cov_mu[d, d] + 1e-9;
cov_plot[d, d] = cov_plot[d, d] + 1e-9;
}
L_cov_mu = cholesky_decompose(cov_mu);
L_cov_plot = cholesky_decompose(cov_plot);
GP_mu = L_cov_mu * GP_mu_std;
GP_plot = L_cov_plot * GP_plot_std;
}
for (n in 1:N) {
eta[n] = mu
+ GP_mu[diam_idx[n]]
+ GP_plot[diam_idx[n], plot_idx[n]]
+ plot_re[plot_idx[n]]
+ log(exposure[n]);
}
}
model {
target += normal_lpdf(mu | 0, 0.1);
// Mean GP
target += std_normal_lpdf(to_vector(GP_mu_std) | );
target += gamma_lpdf(alpha_mu | 5, 20);
target += inv_gamma_lpdf(rho_mu | 5, 40);
// Plot random effect
target += std_normal_lpdf(sigma_plot | );
target += normal_lpdf(plot_std | 0, 0.1);
// Plot GP
target += std_normal_lpdf(to_vector(GP_plot_std) | );
target += gamma_lpdf(alpha_plot | 5, 20);
target += inv_gamma_lpdf(rho_plot | 5, 40);
// Overdispersion
target += gamma_lpdf(inverse_phi | 5, 20);
// Likelihood
if (!prior_only) {
target += neg_binomial_2_log_lpmf(y | eta, phi);
}
}
generated quantities {
int<lower=0> y_new[N];
for (n in 1:N) {
y_new[n] = neg_binomial_2_log_rng(eta[n], phi);
}
}
```

At this point I would like to learn to extend such a model to include continuous nonlinear effects that occur “above” (and by extension “below” or within) and have interaction with the continuous effect. In the vote-share example I guess this could be variables like median income or median age for regions. For my application, this might be a plot-level variable that I expect to have some interaction with the size-effect in predicting count, like number of years since disturbance, or a climatic variable, or a derived remotely sensed predictor.

In this case, are there better methods than simply including the additional variable(s) in the X matrix? In addition to the fact that the observations occur at different levels in the data, I have vague, somewhat uninformed concerns about increasing the dimensionality of the X matrix, variable scaling for the distance computation, and the practical impacts on computation for large data sets if I can no longer lean on the neatly binned diameter class information to efficiently build the covariance matrix (there are many more plots than diameter bins). Any critiques of my approach, general guidance, or concrete examples of GP models that incorporate continuous predictors at multiple hierarchies would be appreciated.

Thanks!