Precision of Timing Estimates

betanalpha · June 10, 2020, 2:44am

@rok_cesnovar, in the recent changes to how time differences are computed in the services did you end up using a new base unit of time? The gradient evaluate time estimates are now returning zero (CmdStan, develop) which makes me think that the time differences are being computed in a unit without enough precision (we’d need at least microseconds for the gradient evaluation time estimates). Just wanted to check before creating an issue. Thanks!

rok_cesnovar · June 10, 2020, 4:56am

Indeed, it was set to miliseconds for all timings. Will switch to a smaller unit for the estimator or maybe even all timings. Its definitely bad for the estimator, that did not occur to me.

Thanks for spotting this!

wds15 · June 10, 2020, 7:29am

We went with millisecond accuracy. The rounding can be changed to microseconds, but I doubt that there is any guarantee for all of that to work. It will likely work on most of our platforms though.

What is the rationale for this level of accuracy? Maybe we can avoid the rounding for whatever you want to do.

rok_cesnovar · June 10, 2020, 8:50am

I am still not sure why you think that there is no guarantee that it wont work.

From cppreference:

template<
    class Rep,
    class Period = std::ratio<1>
> class duration;

It consists of a count of ticks of type Rep and a tick period, where the tick period is a compile-time rational constant representing the number of seconds from one tick to the next.

So (count of ticks) * (number of seconds for one tick) = time in seconds. Why do you think there is no guarantee?

The other point is that std::chrono::steady_clock is part of C++ since C++11 and if this would not work we would at least see some stack overflow questions or what not in the ~10 years. I searched a ton when you mentioned this on the PR.

I think not rounding the duration in all places would be the best solution.

wds15 · June 10, 2020, 9:28am

I also searched a lot and what I found is that people say that what exactly is one “tick” is simply not defined… and I also can never find this clearly defined in the C++ standard docs.

The only reference to that a tick is a second is the default argument to the template parameter which you quote - but I don’t see that as a proper definition in the sense of spelling out a standard.

The solution we have is good as it gives us well defined accuracy. We should just go to a higher precision of microseconds if that solves the issue. That’s my take on this (I can be wrong, of course).

rok_cesnovar · June 10, 2020, 11:30am

Does it really matter what t is in:

X t * \frac{Y s}{t}

We are interested in X*Y in seconds. What we currently do is:

double sample_delta_t
      = std::chrono::duration_cast<std::chrono::milliseconds>(end - start)
            .count()
        / 1000.0;

where

std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count()

gets us X*Y*1000.

std::chrono::milliseconds is just std::ratio (std::milli) that multiplies by 1000. And we then do X*Y*1000/1000.0. It does spell out milliseconds in code though. That is a plus, I guess.

I am fine with replacing 1000 with 1000000, but feel its unecessary and would rather not use casting at all. But just want to fix this either way.

wds15 · June 10, 2020, 11:52am

Not sure what X and Y is… but we have the fundamental problem that the standard does not define what 1 tick is equal to. The C++ docs are only implicitly saying that the physical time unit used for counting is 1 second - it is not explicitly saying so. Thus, the safest way is to convert to a well defined time unit using the pre-defined facilities of the standard.

EDIT: We could change how we report things. Instead of writing “seconds” we write “ticks”. That would be the unit which we are guaranteed to get - whatever those ticks are on the platform is unspecified.

rok_cesnovar · June 10, 2020, 1:18pm

X is the number of ticks we measured and Y is the compile time constant representing the number of seconds in a tick.

So if we measure 20 ticks and we know Y is 0.0005 s per tick not sure why it matters what a tick is. X*Y is what duration returns.

wds15 · June 10, 2020, 1:29pm

We are getting into a discussion we already had, I think…

anyway, I just checked and. @betanalpha is right… as we multiply by 1E3 the time difference and want to present sub-second precision for that number, we just need micro seconds and then we are all set.

Let’s just do that - this will solve the matter just fine… and if it looks so ugly in the code we can introduce a utility function handling it?

EDIT: Though we should check if we don’t overflow when Stan runs for days…

betanalpha · June 10, 2020, 1:38pm

@rok_cesnovar is absolutely correct here regarding the implementation of std::chrono::duration – the documentation is clear and unambiguous. The whole point of std::chrono is to abstract away the concept of “ticks” or “cycles” or however time is actually tracked on the hardware. Instead it defines clocks that manage that progression behind the scenes and durations that translate to differences in times into the number of periods of a certain length that have elapsed.

In particular

std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();

returns the number of millisecond-long periods that have elapsed between start and end. Exactly how that is calculated by the standard library is irrelevant.

If we need sub-second precision then we just use the pattern @rok_cesnovar already uses. For millisecond precision we could the number of millisecond periods that have elapsed and then convert back to seconds,

std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() / 1000.0;

For microsecond precision we would use

std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1e6;

Millisecond precision is fine for the total run times, but we should use microsecond precision for the gradient evaluation time estimates. In other words changing just one line.

@rok_cesnovar do you want me to create an issue?

wds15 · June 10, 2020, 1:42pm

No, we (@rok_cesnovar and myself) debate about a different point here - what is a “tick” is ill defined. The C++ specs ARE totally ambiguous about what a tick is. The docs suggest that one tick corresponds to the physical unit of a second, but that is not written in the docs at all (only implicitly to my eyes).

Anyway, we do seem to agree about the suggested path forward here. I do fully support that we go with

std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() / 1e6;

That will do it in a well defined way… just one more sanity check should be done if this way of calculating may overflow when Stan runs for days.

rok_cesnovar · June 10, 2020, 1:47pm

Just create the issue please with the link to here, so I dont forget. And assign me please. will get on this by the end of the week. Thanks!

betanalpha · June 10, 2020, 1:55pm

As I wrote above there is no concept of a “tick” in the std::chrono library.

Instead there is a period defined as a template parameter in the duration type and duration_cast function which defines the unit of the intervals returned by the count method (the template parameter being a ratio of seconds, with std::milliseconds and std::microseconds being template aliases to ratio<1:1000> and ratio<1:1000000> respectively).
Once a duration type has been defined there is no ambiguity in what count returns.

That is all unambiguously defined by the interface. How the intervals are counted (say as a function of processor clock cycles) is considered as an implementation detail and won’t matter unless we start pushing precisions near the clock speed of the processor.

wds15 · June 10, 2020, 1:58pm

yup, perdios are defined in terms of seconds which you get when multiplying the count with the rep thing. The rep thing relates to ticks. What a tick is is left unspecified. It’s implicitly only defined to be a second.

Anyway, if we cast into the pre-defined std::microseconds or whatever thing, then everything is just fine.

You run into trouble when you use duration<double> as you then get vanilla ticks in a unit not defined as to what physical unit it is.

rok_cesnovar · June 10, 2020, 2:32pm

Fine @wds15 ,we will go with chrono::microseconds. It doesnt make sense to drag this anymore. Its such a small detail I dont want to waste more time on this since both fix this issue.

you are essentially saying that X is not fine for you, but X*1e6/1e6 is.

wds15 · June 10, 2020, 3:03pm

if someone else wants to review the PR and take the accountability (to some extent) for it… fine by me.

changing as discussed above is all fine for me and I am happy to approve.

again, sorry for my persistency, but that’s what is my job as reviewer as I understand it… and we found a solution here which is fine for all of us, I think.

rok_cesnovar · June 10, 2020, 6:36pm

@betanalpha no need for the issue. PR is open and approved. Thanks again for the report.

betanalpha · June 11, 2020, 1:53am

Thanks for the lightning fast resolution!

Topic		Replies	Views
Why don't the execution time add up to real elapsed time General stanc	5	1112	August 22, 2017
Causes for numerical discrepancies, or, help me figure out how these two model CPP files differ Developers	3	388	June 24, 2019
Timed iteration updates and backward compatibility standards Developers	45	1693	October 25, 2018
Profiling C++ code Developers math	30	11162	March 26, 2018
Stan backend for NumPyro + performance comparison Publicity	10	3230	January 17, 2021

Precision of Timing Estimates

Related topics