Gradient evaluation time differs across chains

Craig_Wang · June 26, 2017, 1:38pm

Dear Stan users,

I have a somewhat complex Gaussian process model, and the gradient evaluation time slightly differs across chains, does anyone know what causes the difference? Below are a single run in rstan for 4 chains, I also encounter >0.006 seconds for the same model.

Gradient evaluation took 0.004218 seconds
1000 transitions using 10 leapfrog steps per transition would take 42.18 seconds.
Gradient evaluation took 0.004234 seconds
1000 transitions using 10 leapfrog steps per transition would take 42.34 seconds.
Gradient evaluation took 0.003572 seconds
1000 transitions using 10 leapfrog steps per transition would take 35.72 seconds.
Gradient evaluation took 0.003702 seconds
1000 transitions using 10 leapfrog steps per transition would take 37.02 seconds.

Any help is appreciated, thanks.

betanalpha · June 26, 2017, 6:31pm

Computation timing is a stochastic calculation that will very from hardware to hardware, run to run, based on the exact environment in which your code is processed. This variation becomes more pronounced the smaller of a time interval you wish to measure, and it is no surprise that you see this much variation when trying to time down to the millisecond.

aaronjg · June 27, 2017, 5:58am

Those actually look quite very similar, I sometimes get things like this:

Gradient evaluation took 0.001223 seconds
1000 transitions using 10 leapfrog steps per transition would take 12.23 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 400 [  0%]  (Warmup)
Iteration:  25 / 400 [  6%]  (Warmup)
Iteration:  50 / 400 [ 12%]  (Warmup)
Iteration:  75 / 400 [ 18%]  (Warmup)

SAMPLING FOR MODEL '3a010658fb94b613e3bcd4bd00c2cfe2' NOW (CHAIN 2).

Gradient evaluation took 0.017012 seconds
1000 transitions using 10 leapfrog steps per transition would take 170.12 seconds.
Adjust your expectations accordingly!

One thing that I have done to evaluate performance of the model is to use system.time, replicate `` and the log.prob function with random starting parameters, which creates much more consistent estimates of the running time.

Craig_Wang · June 27, 2017, 7:48am

Thanks I see, it is just the small variations can lead big differences when the leapfrog steps are large. I was trying to see if I could reduce the time a bit by finding the reason for the difference in gradient evaluation time. Apparently no :(

Craig_Wang · June 27, 2017, 7:49am

Thanks for sharing your approaches to estimate the running time, I’d try those out.

aaronjg · June 27, 2017, 8:00am

I don’t think it has anything to do with the leapfrog steps - just that computational timing can be stochastic. For example if the kernel does a context switch in the middle of an evaluation, or some floating point functions can take variable numbers of clock cycles to execute depending on the values passed in, or some functions may be approximate that compute up to a specific tolerance and the number of iterations can vary based on the inputs.

Bob_Carpenter · July 5, 2017, 9:14pm

The number of leapfrog steps will also vary based on where you are in the posterior. Time to evaluate each log density should be consistent, but it’s subject to communication load and other processor load inside the computer (as @betanalpha already noted).

betanalpha · July 5, 2017, 9:51pm

In general, yes, the dynamics HMC algorithm that drives Stan uses a different number of leapfrog steps depending on where in the parameter space the sampler is. For the time extrapolation being discussed here, however, a constant number of leapfrog steps are used so the suspects are communication and processor load.

Topic		Replies	Views
Time to run General	4	12624	June 7, 2017
What is the relationship between transition and iteration General	3	1878	April 12, 2021
Gradient evaluation took 0 seconds? RStan	1	908	May 26, 2022
How does Stan's computation time scale with sample size? General hierarchical-model	2	1294	March 15, 2022
How to reduce sampling time? Modeling	2	904	December 9, 2021

Gradient evaluation time differs across chains

Related topics