Hi all,
I am trying to model a periodic function as a Gaussian process. The modelled function has not other constraints than the need to be periodic and smooth. My initial idea was to use squared exponential kernel, and only redefine the distance metric, so that it treats the time variable as a circle. However, for certain combinations of parameters this fails to yield a positive semi-definite matrix. Any idea what might be wrong? Or a pointer to some literature on GPs using different distance metrics than Euclidean distance?
Thanks a lot!
Here is R code that shows the problem (I encountered this why preparing simulated data for a Stan model in R)
generate_periodic_gp <- function(time, scale, length) {
# Construct the squared exponential covariance matrix
cov_matrix = array(0,c(length(time), length(time)));
maxTime = max(time)
minTime = min(time)
for(i in 1:length(time)) {
cov_matrix[i,i] = scale^2 + 0.00000001; #Adding jitter to ensure positive semi-definiteness
if (i < length(time)) {
for(j in (i+1):length(time)) {
#Take either the normal distance or the distance wrapped around the edge, whichever is shorter
distance = min(abs(time[i] - time[j]), min(time[i], time[j]) - minTime + maxTime - max(time[i], time[j]))
covariance = (scale^2) * exp( (-0.5 / (length ^ 2)) * (distance ^ 2) );
cov_matrix[i,j] = covariance
cov_matrix[j,i] = covariance
}
}
}
# Draw from the multinormal distribution using the cholesky decomposition of the cov. matrix
chol_cov = t(chol(cov_matrix));
values = chol_cov %*% rnorm(length(time));
return(values)
}
times = seq(1,10, by =0.1)
values = generate_periodic_gp(times,1,0.7)
plot(times, values, type ="l") #Works as expected, gives a smooth curve and values[1] == values[length(values)]
values = generate_periodic_gp(times,1,0.8) #Fails, not positive definite
1 Like
@flaxter has committed the crime of talking about periodic kernels in the last few days, so maybe he can give better comments here, but I’ll try to just provide pointers.
Check out: Approximate GPs with Spectral Stuff
The Stan model at the top has Fourier in the name. Not sure what it is, but anything Fourier seems like it’d be good for periodic stuff. The arxiv paper on the Poisson intensity estimation he links there has another kernel defined on a periodic domain.
I just Googled periodic RBF and this page came up: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html . It has some sorta strange periodic kernel there that might work!
1 Like
Thanks for the help - periodic kernel from http://www.cs.toronto.edu/~duvenaud/cookbook/index.html does the trick. Actually, I played before with the periodic kernel from Wikipedia (very slightly different formulation) and it gave weird results, but it must have been an implementation bug on my side, because now it works :-)
For future travellers, the formula for the periodic kernel is:
covariance[i,j] = (scale^2) * exp(-2*(sin(PI * abs(x[i] - x[j]) / period)^2) / (length ^ 2))
Thanks again!
4 Likes
Sorry for the late reply–you could also use the spectral mixture kernel (https://arxiv.org/pdf/1302.4245.pdf). Here is Stan code for a two component mixture (i.e. an additive kernel with two different periodic components) in case it’s helpful to someone.
matrix[n1, n1] Sigma1;
for (i in 1:n1) {
Sigma1[i, i] <- var1 + var2;
for (j in (i+1):n1) {
Sigma1[i, j] <- var1 * exp(-(x1[i]-x1[j])^2*bw1) * cos(twopi * fabs(x1[i] - x1[j]) * period1) +
var2 * exp(-(x1[i]-x1[j])^2*bw2) * cos(twopi * fabs(x1[i] - x1[j]) * period2);
Sigma1[j, i] <- Sigma1[i, j];
}
}
2 Likes
Sorry to resurrect such an old thread, but just started playing with a periodic-kernel GP and I’m trying to wrap my head around how the period
parameter is differentiated from the lengthscale
parameter. I get that the thing that’s repeating every period might be irregularly shaped, like the example from the kernel cookbook linked earlier:
in which case lengthscale
will determine the wiggliness inside the repeating bit and period
will determine how often it repeats. But if we’re dealing with a repeating signal that is less complex, like a sine wave at the extreme of simplicity, I feel like lengthscale
isn’t going to do any “work” anymore, and threatens to become non-identified from period
, right?
1 Like
I honestly don’t know - in the application I was trying to use it, the period
was fixed, so it was not an issue. My guess is that differentiating period
and lengthscale
could really be difficult and you should probably at least constrain period
to be a priori much shorter than the observed data range while being much wider then the distance between several consecutive observations. At the same time you probably should constrain lenghtscale
to not be much larger then period
and (similarly to quadratic exponential kernel) longer the the distance between two consecutive observations.
1 Like
If we’re interpreting period
as the distance on the x-axis between repetitions, it needs to be /period1
not *period1
, right? The corrected code being:
matrix[n1, n1] Sigma1;
for (i in 1:n1) {
Sigma1[i, i] <- var1 + var2;
for (j in (i+1):n1) {
Sigma1[i, j] <- var1 * exp(-(x1[i]-x1[j])^2*bw1) * cos(twopi * fabs(x1[i] - x1[j]) / period1) +
var2 * exp(-(x1[i]-x1[j])^2*bw2) * cos(twopi * fabs(x1[i] - x1[j]) / period2);
Sigma1[j, i] <- Sigma1[i, j];
}
}```