I was trying to run Gaussian process regression in Stan with 5000 observations and 14 factors. It runs very very slowly. Are there any packages that run such problems? The Stan code is attached. The code does well for 100 observations
Thanks for any advice.
GP.stan (3.2 KB)
I don’t see any obvious routes to speedups. Presumably you’ve already tried on a GPU?
One thing that’s unlikely to help much but worth a try if you’re curious is to precompute the set of unique differences in
X. This saves a small amount of compute during sampling but if the set of unique differences is actually much smaller than the total number of differences, you might save more substantial compute. In my tests long ago I found the speedup from this didn’t match simply using cov_exp_quad, but possibly since you’re not using cov_exp_quad yourself it might come in handy.
Thanks a lot. In my case I have 14 factors. This example is one factor.
Also I wonder if there are any R packages which do Gaussian process regression? I am specifically interested in the package by Aki Vehtari.
you can do additive GP regression with Longitudinal Gaussian Process Regression • lgpr but it won’t be any faster than your code
Those use Stan to sample kernel parameters. If you want to just optimize hyperparameters, there is for example gplite: gplite Quickstart
I was not reading it very carefully, but I’ve noticed that your calcP function requires you to do a nested for loop, by the number of observations on each level. It is certainly a delaying factor on as you need O(25mln) multiplications per every step. Maybe you can vectorize it?
The Cholesky decomposition is still the computationally most demanding part there. I am certainly interested if there is some way to get less autodiff variables and speed up computation that way, but I am surprised if there is any way to get that implementation running faster than several days for > 2000 observations.
It would be nice. Do you have any suggestions?
Maybe use already implemented covariance function? 5.13 Covariance functions | Stan Functions Reference
It will certainly be more efficient than an explicit loop
Thank you. The problem is that I need length scale parameter to be a vector. The current implementation is real.
Oh, I have not noticed that. But vectorization will be easy.
You need to generate two matrices of repeated X’s one as rows and one as columns. Subtract them from each other, multiply by a diagonal matrix of rho’s from the appropriate side, call element wise square, divide by two and exponentiate.
Unless I’ve missed something it should work.
That is how you could do it if you had a one-dimensional x and you want the cov_exp_quad kernel, but here the ARD kernel was used and each x is a vector with length Ncol (=14?).
But I just have the feeling that even if you can avoid a loop in kernel matrix computation, it’s not going to help a lot here, because you still have to do cholesky for the 5000 x 5000 matrix and have to sample the 5000 eta parameters