Using 2D Gaussian process predictions within model

mike-lawrence · February 15, 2022, 3:47pm

I don’t yet understand fully what degree of hierarchy you have in your data, but starting from a non-hierarchical case, a given lengthscale and amplitude combination uniquely define a covariance matrix K, which should be computed only once in a model. If you have a model that necessitates multiple realizations from the multivariate normal implied by K, then (taking the non-centered case and referring to variable names of my example) you need a separate gp_helper variable for each realization, which can most easily be achieved by making gp_helper a matrix. To illustrate, let’s say you have two observed-data vectors in y that you want to model as separate GP realizations but with a shared lengthscale & amplitude:

data{
	int<lower=2> n_xy ;
	array[n_xy] real x ;
	matrix[n_xy,2] y ;
}
parameters{
	real grand_mean ;
	real<lower=0> noise_amplitude ;
	real<lower=0> gp_amplitude ;
	real<lower=0> gp_lengthscale ;
	matrix[n_xy,2] gp_helper ;
}
transformed parameters{
	// gp_L: cholesky-decomposed gp covariance
	matrix[n_xy,n_xy] gp_L = cholesky_decompose(
		add_diag(
			gp_exp_quad_cov(
				x
				, gp_amplitude
				, gp_lengthscale
			)
			, gp_amplitude * 1e-5 //jitter
		)
	) ;

	// gp: latent (noiseless) GP
	matrix[n_xy,2] gp ;
	gp[,1] = gp_L * gp_helper[,1] ;
	gp[,2] = gp_L * gp_helper[,2] ;
}
model{
	// Priors:
	grand_mean ~ ... ;
	noise_amplitude ~ ... ;
	gp_amplitude ~ ... ;
	gp_lengthscale ~ ... ;
	// gp_helper must be ~ std_normal()
	to_vector(gp_helper) ~ std_normal() ;
	// Likelihood:
	y[,1] ~ normal( grand_mean + gp[,1] , noise_amplitude ) ;
	y[,2] ~ normal( grand_mean + gp[,2] , noise_amplitude ) ;
}

(note that code is pedagogically-aimed; speedup could be achieved with indexing to vectorize the likelihood)

Again, the above instantiates a model whereby the latent GPs are independent realizations of a common latent multivariate normal, so K only needs to be computed once. K also only needs to be computed once if the two latent GPs share a lengthscale but differ in amplitude, since

cholesky_decompose(gp_exp_quad_cov(x,amp,ls))*helper

is equal to

(cholesky_decompose(gp_exp_quad_cov(x,1.0,ls))*helper)*amp

So in the case of multiple amplitudes but common lengthscale you’d do:

...
parameters{
...
	vector<lower=0>[2] gp_amplitude ;
...
}
transformed parameters{
	// gp_L: cholesky-decomposed gp covariance
	matrix[n_xy,n_xy] gp_L = cholesky_decompose(
		add_diag(
			gp_exp_quad_cov(
				x
				, 1.0
				, gp_lengthscale
			)
			,  1e-5 //jitter
		)
	) ;

	// gp: latent (noiseless) GP
	matrix[n_xy,2] gp ;
	gp[,1] = (gp_L * gp_helper[,1]) * gp_amplitude[1] ;
	gp[,2] = (gp_L * gp_helper[,2]) * gp_amplitude[2] ;
}
...

If the lengthscales are different though, you have to compute the covariance matrix anew for each:

...
parameters{
...
	vector<lower=0>[2] gp_lengthscale ;
...
}
transformed parameters{
	// gp_L: cholesky-decomposed gp covariance
	array[2] matrix[n_xy,n_xy] gp_L ;
	for( i in 1:2){
		gp_L[i] = 	cholesky_decompose(
				add_diag(
					gp_exp_quad_cov(
						x
						, 1.0
						, gp_lengthscale[i]
					)
					,  1e-5 //jitter
				)
			) ;
	}
	// gp: latent (noiseless) GP
	matrix[n_xy,2] gp ;
	gp[,1] = (gp_L[1] * gp_helper[,1]) * gp_amplitude[1] ;
	gp[,2] = (gp_L[2] * gp_helper[,2]) * gp_amplitude[2] ;
}
...

Now, hierarchy can be handled different ways. One way is to impose a hierarchy on the GP parameters (amplitude & lengthscale) alone, in which case you have the above but with a few hyper-parameters:

parameters{
	...
	real<lower=0> gp_lengthscale_mean ;
	real<lower=0> gp_lengthscale_sd ;
	vector<lower=0,offset=gp_lengthscale_mean,multiplier=gp_lengthscale_sd>[2] gp_lengthscale ;
	real<lower=0> gp_amplitude_mean ;
	real<lower=0> gp_amplitude_sd ;
	vector<lower=0,offset=gp_amplitude_mean,multiplier=gp_amplitude_sd>[2] gp_amplitude ;
}
...
model{
	...
	// Priors on hyperparameters
	gp_lengthscale_mean  ~ ... ;
	gp_lengthscale_sd ~ ...;
	gp_amplitude_mean ~ ...;
	gp_amplitude_sd ~ ... ;
	// gp parameters as hierarchical
	gp_lengthscale ~ normal(gp_lengthscale_mean,gp_lengthscale_sd) ;
	gp_amplitude ~ normal(gp_amplitude_mean,gp_amplitude_sd) ;
}

(notes: with a hierarchical structure imposed on positive-continuous variables like these, it usually works best to model them & their hyperparameters as normal on the log scale then exponentiate when they’re used; also, I’ve non-centered using the offset/multiplier syntax here, but see here for code to automate switching between centered/non-centered)

So if y had many columns, each reflecting observations from a uniquely-identifiable unit of observation (ex. human participant in an experiment), the above implements a hierarchy on the GP parameters, and will achieve inference on each individual’s latent GP function. Inference on the across-individual average GP function might be achieved by averaging the individual functions in each sample from the posterior, but I prefer to instead explicitly model an average latent GP from which the individual GPs are deviations:

data{
	int<lower=2> n_xy ;
	array[n_xy] real x ;
	int<lower=2> n_Ids ;
	matrix[n_xy,n_Ids] y ;
}
parameters{
	real grand_mean ;
	real<lower=0> noise_amplitude ;
	matrix[n_xy,n_Ids+1] gp_helper ; //+1 for the group-level GP

	real log_gp_lengthscale_group ;
	real log_gp_lengthscale_mean ;
	real<lower=0> log_gp_lengthscale_sd ;
	vector<lower=0,offset=log_gp_lengthscale_mean,multiplier=log_gp_lengthscale_sd>[n_Ids] gp_lengthscale ;

	real log_gp_amplitude_group ;
	real log_gp_amplitude_mean ;
	real<lower=0> log_gp_amplitude_sd ;
	vector<lower=0,offset=log_gp_amplitude_mean,multiplier=log_gp_amplitude_sd>[n_Ids] log_gp_amplitude ;
}
transformed parameters{
	// concatenate the group-level parameters with the individual level parameters & exponentiate
	vector[n_Ids+1] gp_lengthscale = exp( append_row( log_gp_lengthscale_group, log_gp_lengthscale ) ) ;
	vector[n_Ids+1] gp_amplitude = exp( append_row( log_gp_amplitude_group, log_gp_amplitude ) ) ;
	// gp: latent (noiseless) GP
	matrix[n_xy,n_Ids+1] gp ; // first is the group-level gp
	for( i in 1:(n_Ids+1)){
		gp[,i] = (
			cholesky_decompose(
				add_diag(
					gp_exp_quad_cov(
						x
						, 1.0
						, gp_lengthscale[i]
					)
					,  1e-5 //jitter
				)
			)
			* gp_helper[,i] 
			* gp_amplitude[i]
		) ;
	}
}
model{
	// Priors:
	grand_mean ~ ... ;
	noise_amplitude ~ ... ;
	log_gp_amplitude_group ~ ... ;
	log_gp_lengthscale_group ~ ... ;
	// Priors on hyperparameters
	log_gp_lengthscale_mean  ~ ... ;
	log_gp_lengthscale_sd ~ ...;
	log_gp_amplitude_mean ~ ...;
	log_gp_amplitude_sd ~ ... ;
	// gp parameters as hierarchical
	log_gp_lengthscale ~ normal(log_gp_lengthscale_mean,log_gp_lengthscale_sd) ;
	log_gp_amplitude ~ normal(log_gp_amplitude_mean,log_gp_amplitude_sd) ;
	// gp_helper must be ~ std_normal()
	to_vector(gp_helper) ~ std_normal() ;

	// Likelihood:
	for( i in 1:n_Ids){
		y[,i] ~ normal( grand_mean + gp[,1] + gp[,i+1], noise_amplitude ) ;
	}
}

(note this time I did the log-scale hierarchies as I mentioned above)

And of course you could model noise and maybe the grand mean hierarchically too. And while I’ve implemented the hierarchies independently above, you could use a multivariate normal structure to instead capture correlations (ex. maybe people with relatively high amplitudes also have relatively long lengthscales).

Topic		Replies	Views
Could you please help to find out why my GP model got slower Modeling gaussian-process	12	1250	April 7, 2021
Speeding up gaussian process model for spatial prediction Modeling fitting-issues , performance , gaussian-process , spatial	4	1848	July 2, 2020
Gaussian process with hierarchical Gaussian process prior Modeling gaussian-process	15	3040	February 27, 2019
Scalable Gaussian process inference Publicity gaussian-process	27	2441	March 25, 2024
Gaussian Process on HPC \| Issues and speeding up Modeling fitting-issues , performance , gaussian-process , simulation-based-calibration	18	1624	November 25, 2020

Using 2D Gaussian process predictions within model

Related topics