# Gaussian Process regression: done but could be improved?

Hi folks, I’ve coded up a model for doing a Gaussian Process regression where you have some numeric variable X whose influence on Y you want to model as a GP (with the possibility of multiple observations of Y per X) but also have some other variables Q,R,S,etc that you want to model as influencing the GP. The code is below and seems to work well, but I suspect that the last bit (starting with the comment “# loop over observations”) could be made more efficient. Any suggestions?

``````functions{
# GP: computes noiseless Gaussian Process
vector GP(real volatility, real amplitude, vector normal01, int n_x, real[] x ) {
matrix[n_x,n_x] cov_mat ;
real amplitude_sq_plus_jitter ;
amplitude_sq_plus_jitter = amplitude^2 + 0.001 ;
cov_mat = cov_exp_quad(x, amplitude, 1/volatility) ;
for(i in 1:n_x){
cov_mat[i,i] = amplitude_sq_plus_jitter ;
}
return(cholesky_decompose(cov_mat) * normal01 ) ;
}
}

data {

# n_y: number of observations in y
int<lower=1> n_y ;

# y: vector of observations for y
#     should be scaled to mean=0,sd=1
vector[n_y] y ;

# n_x: number of unique x values
int<lower=1> n_x ;

# x: unique values of x
#     should be scaled to min=0,max=1
real x[n_x] ;

# x_index: vector indicating which x is associated with each y
int x_index[n_y] ;

# n_w: number of columns in predictor matrix w
int n_w ;

# w: predictor matrix (each column gets its own GP)
matrix[n_y,n_w] w ;

}

parameters {

# noise: measurement noise
real<lower=0> noise ;

# volatility_helper: helper for cauchy-distributed volatility (see transformed parameters)
vector<lower=0,upper=pi()/2>[n_w] volatility_helper ;

# amplitude: amplitude of GPs
vector<lower=0>[n_w] amplitude ;

# f_normal01: helper variable for GPs (see transformed parameters)
vector[n_x] f_normal01[n_w] ;

}

transformed parameters{

# volatility: volatility of GPs (a.k.a. inverse-lengthscale)
vector[n_w] volatility ;

# f: GPs
vector[n_x] f[n_w] ;

#next line implies volatility ~ cauchy(0,10)
volatility = 10*tan(volatility_helper) ;

# loop over predictors, computing GPs for each predictor
for(wi in 1:n_w){
f[wi] = GP(
volatility[wi]
, amplitude[wi]
, f_normal01[wi]
, n_x , x
) ;
}

}

model {

# noise prior
noise ~ weibull(2,1) ; #peaked at .8ish

# amplitude prior
amplitude ~ weibull(2,1) ; #peaked at .8ish

# normal(0,1) priors on GP helpers
for(wi in 1:n_w){
f_normal01[wi] ~ normal(0,1);
}

# loop over observations
for(yi in 1:n_y){
real temp ;
temp = 0 ;
# loop over predictors to gather sum for this observation's mean
for(wi in 1:n_w){
temp = temp + w[yi,wi] * f[wi][x_index[yi]] ;
}
# y_scaled as temp with gaussian measurement noise
y[yi] ~ normal( temp , noise ) ;
}

}``````

Use vectorization. For example, this

``````for (wi in 1:n_w)
f_normal01[wi] ~ normal(0, 1);
``````

can just be

``````to_vector(f_normal01) ~ normal(0, 1);
``````

if you declare it as a matrix. But then that problably causes problem with a multi-normal somewhere. Sometimes it’s not worth the transforming cost for a minor gain like this.

For the big loop, you can write

``````for(yi in 1:n_y){
real temp ;
temp = 0 ;
# loop over predictors to gather sum for this observation's mean
for(wi in 1:n_w){
temp = temp + w[yi,wi] * f[wi][x_index[yi]] ;
}
# y_scaled as temp with gaussian measurement noise
y[yi] ~ normal( temp , noise ) ;
}
``````

as

``````for (i in 1:n_y)
y[i] ~ normal(sum(w[i] .* f[ , x_index[i]]), noise);
``````

though I’d suggest using a variable like `sigma` or `sigma_y` instead of `noise` (what you write as `noise` is actually the parameter for the scale of the “noise”).

P.S. May I ask what inspires creative formatting like this:

``````        f[wi] = GP(
volatility[wi]
, amplitude[wi]
, f_normal01[wi]
, n_x , x
) ;
``````

It makes it very hard for those of us used to conventional formatting to follow.

Thanks Bob! And sorry for the atypical use of whitespace; when I post for help I usually try to remove the reasoned-but-admittedly-idiosyncratic features of my coding style from my code lest, as you say, its atypicality cause confusion, but forgot in this case.

I had `f_normal01` and `f` as an array of vectors as a hold-over from a previous attempt at a more complicated model (hierarchical GPs), but now realize that making them matrices instead makes things easier (indeed, it also makes the hierarchical version easier, but I’ll post on that later).

The `sum(w[i] .* f[,x_index[i]])` trick to avoid that awkward loop also works great. I also found a much more substantial speed-up whereby (inspired by this thread) I take advantage of the fact that in the typical scenario I’ve been imagining for this, there should be lots of redundancy in the matrix `w`, so I can compute on only the unique rows in w and index into the result when updating the lp. This yields a nearly 10x speedup and much higher effective sample sizes for the key variables.

Here’s a gist with a full example of what this code is intended to achieve, including fitting some fake data using both this more efficient approach as well as the less-efficient approach. Note that where I’ve been using “w” to refer to the predictor matrix above, in that example I switched to “z”.

Thanks much for reporting back.

When there’s structure to the data, by all means exploit it for efficiency. It’s always worth finding things not to compute or for which simple sufficient stat calculations over duplicated data itesm can reduce calls to expensive functions.

P.S. The biggest reason for using standard formatting conventions is that one of the most important aspects of code quality is readability.