In Distributed Computing the notion of "when-ness" is fundamental; Lamport's "Time, Clocks, and the. Ordering of Events in a Distributed System" paper is considered one of the foundational pieces of work.
But what about locally?
in the Java APIs, we have: System.currentTimeMillis() and System.nanoTime() to return time.
we experienced developers "know" that currentTimeMillis() is on the "wall clock", so that if things happen to that clock: manual/NTP clock shifts, VM migration, that time can suddenly jump to a new value. And for that reason, nanoTime() is the one that we should really be using to measure time, monotonically.
Except I now no longer trust it. I've known for a long time that CPU frequency could change its rate, but as of this week I've now discovered that on a multi-socket (And older multi-core system), the nanoTime() value may be or more of:
- Inconsistent across cores, hence non-monotonic on reads, especially reads likely to trigger thread suspend/resume (anything with sleep(), wait(), IO, accessing synchronized data under load).
- Not actually monotonic.
- Achieving a consistency by querying heavyweight counters with possible longer function execution time and lower granularity than the wall clock.
The standard way to read nanotime on an x86 part is reading the TSC counter, via the RDTSC opcode. Lightweight, though actually a synchronization barrier opcode.
Except every core in a server may be running at a different speed, and so have a different value for that counter. When code runs across cores, different numbers can come back.
In Intel's Nephalem chipset the TSC is shared across all cores on the same die, and clocked at a rate independent of the CPU: monotonic and consistent across the entire socket. Threads running in any core in the same die will get the same number from RDTSC —something that System.nanoTime() may use.
Fill in that second socket on your server, and you have lost that consistency, even if the parts and their TSC counters are running forwards at exactly the same rate. Any code you had which relied on TSC consistency is now going to break.
This is all ignoring virtualization: the RDSTC opcode may or may not be virtual. If it is: you are on your own.
Operating systems are aware of this problem, so may use alternative mechanisms to return a counter: which may be neither monotonic nor fast.
Here then, is some reading on the topic
- Inside the Hotspot VM: Clocks, Timers and Scheduling Events - Part I
- JDK-6440250 : On Windows System.nanoTime() may be 25x slower than System.currentTimeMillis()
- JDK-6458294 : nanoTime affected by system clock change on Linux (RH9) or in general lacks monotonicity
- Redhat on timestamps in Linux
- VMWare: Timekeeping in VMware Virtual Machines
You state that: "In Intel's Nephalem chipset the TSC is shared across all cores on the same die, and clocked at a rate independent of the CPU: monotonic and consistent across the entire socket."
ReplyDeleteDo you have a source for this claim, or is it something you have observed experimentally?
Something that I read in those various documents. No observed data here, sorry
DeleteFor measuring intervals, should we prefer {{System.nanotime}}? I read that it avoids effect of System time changes on elapsed time calculations.
ReplyDeleteno, that's the whole problem here. System.nanotime is only guaranteed to be monotonically increasing on a single core in a single socket of a CPU. Reschedule threads onto a different socket or even core and your clock could go backwards
Delete