I don’t yet understand fully what degree of hierarchy you have in your data, but starting from a non-hierarchical case, a given lengthscale and amplitude combination uniquely define a covariance matrix K
, which should be computed only once in a model. If you have a model that necessitates multiple realizations from the multivariate normal implied by K
, then (taking the non-centered case and referring to variable names of my example) you need a separate gp_helper
variable for each realization, which can most easily be achieved by making gp_helper
a matrix. To illustrate, let’s say you have two observed-data vectors in y that you want to model as separate GP realizations but with a shared lengthscale & amplitude:
data{
int<lower=2> n_xy ;
array[n_xy] real x ;
matrix[n_xy,2] y ;
}
parameters{
real grand_mean ;
real<lower=0> noise_amplitude ;
real<lower=0> gp_amplitude ;
real<lower=0> gp_lengthscale ;
matrix[n_xy,2] gp_helper ;
}
transformed parameters{
// gp_L: cholesky-decomposed gp covariance
matrix[n_xy,n_xy] gp_L = cholesky_decompose(
add_diag(
gp_exp_quad_cov(
x
, gp_amplitude
, gp_lengthscale
)
, gp_amplitude * 1e-5 //jitter
)
) ;
// gp: latent (noiseless) GP
matrix[n_xy,2] gp ;
gp[,1] = gp_L * gp_helper[,1] ;
gp[,2] = gp_L * gp_helper[,2] ;
}
model{
// Priors:
grand_mean ~ ... ;
noise_amplitude ~ ... ;
gp_amplitude ~ ... ;
gp_lengthscale ~ ... ;
// gp_helper must be ~ std_normal()
to_vector(gp_helper) ~ std_normal() ;
// Likelihood:
y[,1] ~ normal( grand_mean + gp[,1] , noise_amplitude ) ;
y[,2] ~ normal( grand_mean + gp[,2] , noise_amplitude ) ;
}
(note that code is pedagogically-aimed; speedup could be achieved with indexing to vectorize the likelihood)
Again, the above instantiates a model whereby the latent GPs are independent realizations of a common latent multivariate normal, so K
only needs to be computed once. K
also only needs to be computed once if the two latent GPs share a lengthscale but differ in amplitude, since
cholesky_decompose(gp_exp_quad_cov(x,amp,ls))*helper
is equal to
(cholesky_decompose(gp_exp_quad_cov(x,1.0,ls))*helper)*amp
So in the case of multiple amplitudes but common lengthscale you’d do:
...
parameters{
...
vector<lower=0>[2] gp_amplitude ;
...
}
transformed parameters{
// gp_L: cholesky-decomposed gp covariance
matrix[n_xy,n_xy] gp_L = cholesky_decompose(
add_diag(
gp_exp_quad_cov(
x
, 1.0
, gp_lengthscale
)
, 1e-5 //jitter
)
) ;
// gp: latent (noiseless) GP
matrix[n_xy,2] gp ;
gp[,1] = (gp_L * gp_helper[,1]) * gp_amplitude[1] ;
gp[,2] = (gp_L * gp_helper[,2]) * gp_amplitude[2] ;
}
...
If the lengthscales are different though, you have to compute the covariance matrix anew for each:
...
parameters{
...
vector<lower=0>[2] gp_lengthscale ;
...
}
transformed parameters{
// gp_L: cholesky-decomposed gp covariance
array[2] matrix[n_xy,n_xy] gp_L ;
for( i in 1:2){
gp_L[i] = cholesky_decompose(
add_diag(
gp_exp_quad_cov(
x
, 1.0
, gp_lengthscale[i]
)
, 1e-5 //jitter
)
) ;
}
// gp: latent (noiseless) GP
matrix[n_xy,2] gp ;
gp[,1] = (gp_L[1] * gp_helper[,1]) * gp_amplitude[1] ;
gp[,2] = (gp_L[2] * gp_helper[,2]) * gp_amplitude[2] ;
}
...
Now, hierarchy can be handled different ways. One way is to impose a hierarchy on the GP parameters (amplitude & lengthscale) alone, in which case you have the above but with a few hyper-parameters:
parameters{
...
real<lower=0> gp_lengthscale_mean ;
real<lower=0> gp_lengthscale_sd ;
vector<lower=0,offset=gp_lengthscale_mean,multiplier=gp_lengthscale_sd>[2] gp_lengthscale ;
real<lower=0> gp_amplitude_mean ;
real<lower=0> gp_amplitude_sd ;
vector<lower=0,offset=gp_amplitude_mean,multiplier=gp_amplitude_sd>[2] gp_amplitude ;
}
...
model{
...
// Priors on hyperparameters
gp_lengthscale_mean ~ ... ;
gp_lengthscale_sd ~ ...;
gp_amplitude_mean ~ ...;
gp_amplitude_sd ~ ... ;
// gp parameters as hierarchical
gp_lengthscale ~ normal(gp_lengthscale_mean,gp_lengthscale_sd) ;
gp_amplitude ~ normal(gp_amplitude_mean,gp_amplitude_sd) ;
}
(notes: with a hierarchical structure imposed on positive-continuous variables like these, it usually works best to model them & their hyperparameters as normal on the log scale then exponentiate when they’re used; also, I’ve non-centered using the offset/multiplier syntax here, but see here for code to automate switching between centered/non-centered)
So if y had many columns, each reflecting observations from a uniquely-identifiable unit of observation (ex. human participant in an experiment), the above implements a hierarchy on the GP parameters, and will achieve inference on each individual’s latent GP function. Inference on the across-individual average GP function might be achieved by averaging the individual functions in each sample from the posterior, but I prefer to instead explicitly model an average latent GP from which the individual GPs are deviations:
data{
int<lower=2> n_xy ;
array[n_xy] real x ;
int<lower=2> n_Ids ;
matrix[n_xy,n_Ids] y ;
}
parameters{
real grand_mean ;
real<lower=0> noise_amplitude ;
matrix[n_xy,n_Ids+1] gp_helper ; //+1 for the group-level GP
real log_gp_lengthscale_group ;
real log_gp_lengthscale_mean ;
real<lower=0> log_gp_lengthscale_sd ;
vector<lower=0,offset=log_gp_lengthscale_mean,multiplier=log_gp_lengthscale_sd>[n_Ids] gp_lengthscale ;
real log_gp_amplitude_group ;
real log_gp_amplitude_mean ;
real<lower=0> log_gp_amplitude_sd ;
vector<lower=0,offset=log_gp_amplitude_mean,multiplier=log_gp_amplitude_sd>[n_Ids] log_gp_amplitude ;
}
transformed parameters{
// concatenate the group-level parameters with the individual level parameters & exponentiate
vector[n_Ids+1] gp_lengthscale = exp( append_row( log_gp_lengthscale_group, log_gp_lengthscale ) ) ;
vector[n_Ids+1] gp_amplitude = exp( append_row( log_gp_amplitude_group, log_gp_amplitude ) ) ;
// gp: latent (noiseless) GP
matrix[n_xy,n_Ids+1] gp ; // first is the group-level gp
for( i in 1:(n_Ids+1)){
gp[,i] = (
cholesky_decompose(
add_diag(
gp_exp_quad_cov(
x
, 1.0
, gp_lengthscale[i]
)
, 1e-5 //jitter
)
)
* gp_helper[,i]
* gp_amplitude[i]
) ;
}
}
model{
// Priors:
grand_mean ~ ... ;
noise_amplitude ~ ... ;
log_gp_amplitude_group ~ ... ;
log_gp_lengthscale_group ~ ... ;
// Priors on hyperparameters
log_gp_lengthscale_mean ~ ... ;
log_gp_lengthscale_sd ~ ...;
log_gp_amplitude_mean ~ ...;
log_gp_amplitude_sd ~ ... ;
// gp parameters as hierarchical
log_gp_lengthscale ~ normal(log_gp_lengthscale_mean,log_gp_lengthscale_sd) ;
log_gp_amplitude ~ normal(log_gp_amplitude_mean,log_gp_amplitude_sd) ;
// gp_helper must be ~ std_normal()
to_vector(gp_helper) ~ std_normal() ;
// Likelihood:
for( i in 1:n_Ids){
y[,i] ~ normal( grand_mean + gp[,1] + gp[,i+1], noise_amplitude ) ;
}
}
(note this time I did the log-scale hierarchies as I mentioned above)
And of course you could model noise and maybe the grand mean hierarchically too. And while I’ve implemented the hierarchies independently above, you could use a multivariate normal structure to instead capture correlations (ex. maybe people with relatively high amplitudes also have relatively long lengthscales).