Speeding Up Gaussian Process model. Stan message abour leapfrog steps

vicvolodina · November 23, 2017, 2:54pm

Hi Stan developers and users,

I have some issues with fitting a Gaussian process model to my data. I am defining a Gaussian process to model SCMdata variable in response to 9 input variables, dealing with 120 data points to fit my model. The linear mean and squared exponential covariance functions are defined for my GP model. You could see the data that I am using from model.data.R file attached.

I am providing initial values for parameters and very concentrated prior for my \sigma^2. However, it takes ages (at least 3 hours) to run the model. Also, I keep getting the message from Stan about leapfrog steps and adjusting expectations, that I don’t really understand.

I am attaching my R code together with Stan file and I would really appreciate any suggestions.

Thanks!

model.data.R (22.8 KB)
FitGP.stan (1.3 KB)
StanModRequest.R (1.2 KB)

bbbales2 · November 23, 2017, 3:05pm

Since your length scale is fixed, move the GP stuff into transformed data.

So do this in transformed data:

  matrix[N1, N1] Sigma;
  matrix[N1, N1] L;
  Sigma = cov_exp_quad(XScaled, 1.0, length_scale);
  for(k in 1:N1) Sigma[k, k] = Sigma[k, k] + nugget;
  L = cholesky_decompose(Sigma);

We have to change how the variance term is handled. The covariance matrix scales linearly with it so we just bring it out (I think it’s included in this function for autodiff efficiency, not math reasons, but if we can avoid autodiffing the Cholesky that’s worth more).

Instead of modeling sigma_sq, change that to just sigma, and use this for your model line:

  y1 ~ multi_normal_cholesky(Mu, sigma * L);

This should be faster since you aren’t doing the Cholesky factorization every iteration.

Hope that helps!

betanalpha · November 24, 2017, 6:26pm

The leapfrog message is a very rough estimate of the time your chains will take to run. In a complex model like a Gaussian process is can be a substantial underestimate of the true running time.

For context, depending on the size of the data and number of parameters three hours can be pretty state of the art in terms of performance. Always a good idea to separate out ideal performance verses reasonable performance.

@bbbales2’s suggestions will make a drastic improvement to your model, but also keep in mind that Gaussian processes with more than a few input dimensions, really any interpolation method in a space with more than a few dimensions, will typically perform poorly away from the input data. Very long HMC trajectories, and slow chains, can be a symptom of the model trying to do too much.

Topic		Replies	Views
Seeking expert stan modeler for help speeding up a complex stan model Jobs fitting-issues , specification , performance	4	928	July 29, 2020
Seeking advice on optimizing a Gaussian Process (GP) prior to improve my model's convergence time Modeling	5	419	June 1, 2024
Could you please help to find out why my GP model got slower Modeling gaussian-process	12	1225	April 7, 2021
Speeding up gaussian process model for spatial prediction Modeling fitting-issues , performance , gaussian-process , spatial	4	1835	July 2, 2020
Efficiency issue with Gaussian Process model Modeling	9	1786	February 11, 2022

Speeding Up Gaussian Process model. Stan message abour leapfrog steps

Related topics