The built-in 'Windows Time' service has never tried to do more than facilitate Kerberos ticket timestamping; it is not a precision timekeeper, and shouldn't be confused with third-party products that are. Increasingly strict requirements, such as those proposed in MiFID II, FINRA Rule 4590, or the CAT NMS Plan, make it worth re-examining Windows as a precision timekeeping platform. Can the objectives be met?
In order to determine if Windows is a suitable timing platform, a brief review of what timekeeping means, as well as how computers achieve it, will help. Regardless of the operating system and time synchronization methods used, all computers keep track of passing time in roughly the same way.
The ultimate goal of timekeeping is to have time pass within the computer at exactly one second per second. One second/second means the time of day as seen from within the computer would advance exactly as much, over a given time period, as would the world's best atomic clock. If achieved, one would only need set the internal time of day once, and thereafter it would always be correct. Sadly, computer clocks are not this accurate. Internal time advances in steps, and the frequency of these advances dictates the size of each adjustment. Ideally, the frequency is fixed, determinate, and high, so that each adjustment may be as small as possible, yielding the illusion of a smooth, continuous time advancement, where at the end of a given period the computer has accumulated exactly one second/second.
Computers typically use an interrupt to drive the advancement of the system clock. Consider an interrupt triggered once per nanosecond: In a perfect world, the computer would only have to increment its internal time of day counter by one at each interrupt in order to maintain the clock to the nanosecond. However, this hypothetical scenario would require the computer to service one billion interrupts every second.
Even if a programmable timer on the motherboard could generate one interrupt per nanosecond, it's unlikely that any CPU could service them. Computers therefore use a much lower frequency. Even the lowly PIT can generate 64 interrupts per second, and it's not an undue burden on the CPU to service that frequency. But decreasing the frequency means increasing the size of the adjustment. At 64 Hz, the computer must add 15.625 ms at each interrupt. At 1 kHz, the computer must add 1 ms at each interrupt. These happily work out to exactly one second/second, but at the cost of our desired smooth, continuous time advancement. Even though after one second has passed externally, one second has accumulated internally, during that second, the time within the computer holds still until each interrupt, then leaps ahead by the amount of the adjustment.
Interrupting the System Clock
The situation is complicated further by two things: Motherboard timers are not able to generate interrupts at an exact frequency, and CPUs cannot service them in a fixed, determinate amount of time. Motherboard timers are usually controlled by the oscillation of a crystal, and crystals are affected by the thermal characteristics of the system, which change over time. Operating systems usually give the time-of-day interrupt the highest possible priority, so that servicing that interrupt may preempt other interrupts currently being serviced, but even so the servicing time is only moderately deterministic. A deviation of only one part per million equates to a loss or gain of a microsecond. If allowed to continue unchecked, such tiny deviations can add up to intolerable levels quite quickly.
The granularity of timekeeping within the Windows kernel is 100 nanoseconds, or 0.1 µs. This period is called a hectonanosecond (abbreviated as 'hns'). With the introduction of the first version of Windows NT, the operating system did indeed program the PIT for 64 Hz, and added 156,250 hns (15.625 ms) per interrupt. Thus, while the system indeed kept time in hns, the actual granularity of reported time was in steps of 156,250 hns. Using the SetSystemTimeAdjustment() API, a timekeeping program can change the amount of the adjustment in order to speed up or slow down the accumulation of passing time within the computer. This process is called 'slewing' the clock, and the goal of slewing is to bring the internal time into conformance with external time without any sudden changes (called 'stepping') in either direction. For example, changing the adjustment by ±1 results in ±64 'extra' hns accumulating over a period of a second, yielding 1.0000064 seconds/second or 0.9999936 seconds/second. For smaller changes, a timekeeping program can hold the adjustment at the desired rate for less than a second. For larger changes, a timekeeping program may either change the adjustment amount to a larger deviation from the default, or simply hold the non-default value for a longer period of time. Programs needing the time of day could call GetSystemTimeAsFileTime() to retrieve the time expressed in hectonanoseconds, but with a granularity of varying between 1 ms and 15.625 ms, depending on various operating system settings.
During that era, being able to get, set, and measure the time to within 1 ms was considered sufficient, especially since a program could force the operating system to a granularity better than the default 15.625 ms. Beginning with Vista/2008, however, Microsoft changed the internal algorithms to provide a 1 ms granularity by default, but, for backward-compatibility reasons, did not change the values reported by GetSystemTimeAdjustment() or the values accepted by SetSystemTimeAdjustment(). Unfortunately, at the same time that they virtualized the counters, they introduced a bug where the remainder of integral division operations was lost during the conversion from reported adjustments to the new internal adjustments. They also chose an interrupt frequency that prevented adding exactly the same adjustment per interrupt; a periodic fix-up was required to add back in the missing milliseconds. For example, GetSystemTimeAdjustment might report that an increment of 156001 hns was being added 64.1022 times per second, but the motherboard interrupt timer could only be programmed in integers, meaning that fewer hns accumulated every second than reported (an average loss of two ms/minute without the fix-ups). These problems persisted through Windows 7/2008 R2, making all four platforms mostly unsuitable for precision timing. The passage of time could only be adjusted by multiples of 16 due to the loss of remainders, and setting the clock to run at a particular rate did not make it run at that rate. Achieving one second/second was impossible; the clock would either be racing ahead or lagging behind. Timekeeping programs on these platforms must continuously change the adjustment value in order to average one second/second, typically over a one minute period. Much of the bad reputation suffered by Windows timing came from these platforms, which were current during the era when administrators were starting to demand accurate timekeeping to better than a millisecond. But only these platforms suffer the kernel bugs.
With the advent of more sophisticated motherboard timers, more powerful CPUs, and the 'invariant' CPU timestamp counter (TSC), it became possible to let the computer continue updating its clock at a relatively low interrupt frequency, while interpolating the passage of time between each interrupt with high accuracy and precision. Before Windows 8/2012, timekeeping programs had to do their own interpolation and find some way to share the information with other programs. Since then, Microsoft has kept an interpolator in the kernel, exposed with the GetSystemTimePreciseAsFileTime() API call. The counters for the interpolated time are mapped into every process' address space, and access does not require a kernel transition, so fetching the interpolated time is nearly as fast as reading memory. (It's only nearly as fast because the counters are memory-mapped into all processes and incremented by the kernel using a lock-free mechanism, which means that reading them means a quick loop to ensure that no portion of the counters changed mid-read. I'll come to another, more technical, reason presently.)
Microsoft's kernel interpolator relies on the best-available timer. On most modern systems, this is the TSC (as long as it's invariant); other sources are an HPET, APIC, PMTimer, or, as a last result, the venerable PIT. The best-available timer forms the basis of the QueryPerformanceCounter() API (usually referred to as 'QPC'). Instead of blindly adding the calculated increment at each interrupt, Microsoft captures the delta between the last known QPC and the current QPC, which allows for minor variations in the frequency of the interrupt to become irrelevant. The time of day is updated at each hardware interrupt using the QPC delta multiplied by the QPC period as adjusted by SetSystemTimeAdjustment(). The computed time delta is then added to the previously stored system time. This accounts for the linear progression of time at each tick. Between ticks, the time is interpolated by calculating how much time has passed since the last interrupt calculation, and this sum is what GetSystemTimePreciseAsFileTime() returns. (And this is the second reason I said "nearly as fast" in the previous paragraph: If QPC is not based on the TSC, a bus transition may be required to read the counters).
A perfect 'clock'
A curious quirk of GetSystemTimePreciseAsFileTime() is that rapid successive queries may show a difference of ±1 hns. Microsoft's documentation says a difference of exactly 1 (in either direction) should be treated as a difference of zero. The best possible interpolated granularity available, therefore, is ~2 hns, or 0.2 µs. In practice, the best-available timer's period is either the CPU's nominal frequency or that supported by the next-best available timer. Typically, this means that periods of time shorter than 200-300 nanoseconds are not measurable. A bus transition to read a motherboard timer typically takes up to 3 hns, increasing the granularity a bit more (for non-TSC based interpolations).
From Windows 8 onward, therefore, the only quibbles are about fractions of a microsecond.
Figure 01: Windows 2012 production machine, running the Hyper-V host role with all but one guest VM quiesced
This process assumes a perfect clock that is set once and is thereafter always correct, happily to sub-microsecond levels. However, the kernel can only go partway toward achieving this goal on its own. Interrupt-driven clocks are not perfect, and timers are subject to thermal drift. Threads are subject to preemption. Time therefore drifts; those tiny parts-per-million errors add up. The clock must be set to the correct time from an external reference at startup, and then errors corrected as they accumulate. This is where third-party timekeeping programs enter the picture.
A timekeeping program needs to check with external sources periodically, and correct any drift that occurs within the computer. As of this date, the network time protocol yielding the best results on Windows is IEEE 1588-2008 (PTP version 2). PTPv2 sync packets typically arrive once per second, giving the timekeeper the chance to tweak the system's rate that often, if needed. The PTPv2 sync frequency is adjustable, from one packet every 64 seconds to 64 packets/second.
Figure 02: Same as Figure 01, but with several busy guest VMs running, showed the following characteristics
In order to provide some real-life numbers to illustrate the timekeeping abilities of Windows, I've sampled several of our own production machines and reproduced the results in several tables below. Measurements are expressed as the delta (in microseconds) each machine calculates between its own current time and the correct time, as provided by a two-step PTPv2 appliance driven by GPS. The appliance we used is a Microsemi (formerly Symmetricom) S350, which represents its own synchronization to GPS as ±100 nanoseconds. PTPv2 was set up using the default End-to-End profile, sending one sync packet per second. To manage the Windows clock, we used Domain Time II Client version 5.2.b.20160415 timekeeping software. We collected one sample per second, and summarized a minute's worth (60 samples) for each machine. Our convention is to use positive numbers to show how much the machine must speed up, and negative to show how much it must slow down. The choice of what to call positive or negative is arbitrary, as long as an equal but opposite change in rate is used to steer the clock.
Note the number of significant digits: Sub-microsecond (0.000000n) values are meaningful. During this sample period, the worst delta was +22.6 µs, which persisted as a gradually-decreasing delta until seven seconds later, at which time the machine was only 3.2 µs off.
Figure 03: Windows 2012 R2 guest VM
This demonstrates that a busy Hyper-V host is less stable than an idle one. Hyper-V gives priority to keeping stable time on the guests, on the assumption that the host partition's job is simply ensuring that the guests run. But even so, the host's worst delta was 124.5 µs, its best was 1 µs, and its average was 54.1 µs. Microsoft's best practices guidelines suggest that a Hyper-V host should do nothing more arduous than running guests. To test Microsoft's guidelines, we examined one of the guests on the busy host.
This particular VM is a busy web server. Even so, it shows no deltas greater than 46.9 µs, and an average of 1.8 µs. These numbers demonstrate that the guests do indeed keep time better than the host partition when the machine is busy.
Figure 04: Stock non-virtualized Windows 10E workstation
All of the above examples so far are taken from Hyper-V machines, either host or guest. Non-virtualized machines perform much better. We tested with Windows 10 to demonstrate the difference.
The worst delta is only 5.8 µs, and it's a significant outlier. The average delta was 2.8 µs, and the least was 0 hns, below the measuring ability of the machine. It will perform at this level, with few if any spikes, even when heavily loaded.
Figure 05: Non-virtualized Windows 7 machine
Increasing the PTPv2 sync frequency beyond the default of one per second can improve syntonization, since the machine will have less time to drift before a change in clock rate can be implemented. The point of diminishing returns for Windows appears to be four sync packets per second; beyond that, the machine spends too much time processing packets and insufficient time steering the clock - remember that a change in rate must be held across at least one hardware interrupt in order to affect the computer's accumulation of elapsed time. PTPv2 even helps the problematic Windows platforms. A non-virtualized Windows 7 machine, fully loaded and busy (my own workstation, used while writing this article), averaged 12.2 µs delta, with the worst case being 30.4 µs. Bear in mind that the problematic Windows platforms (Vista through 2008 R2) will show more outliers. The same machine, during a four-hour period, had nearly same average delta (12.9 µs), but showed several spikes of up to 320 µs.
Even without the assistance of a kernel-based interpolator, Windows 7 (or 2008 R2, which uses the same kernel), does a good job of staying within tolerable levels of precision timekeeping as long as it can synchronize with a local PTPv2 appliance. The same machine, using NTP instead of PTPv2, and sampling the NTP appliance once a minute, can keep only to within one millisecond at all times, and averages 520 µs (~1/2 millisecond) over any one-minute period.
Figure 06: Windows Server 2016 VM on a 2012 Hyper-V host
Windows 2016 Server, currently in pre-release v3, is Microsoft's newest operating system. It breaks with tradition somewhat: In prior releases of Windows, client and server versions shared identical kernel code, differing only in features, GUI, and performance tuning. As of Windows 10/2016, this is no longer true. Fortunately for this discussion, the timekeeping parts of the kernel are shared, and Windows 2016 performs exactly like Windows 10.
Because the 2016 machine is a VM, it doesn't fare as well as the Windows 10 machine shown above. But still, an average of 10.5 µs, with the worst outlier being 20 µs, and the best being only 0.2 µs, is nothing to sneeze at.
Table 01 summarizes expectations for timekeeping on Windows. These numbers represent average performance, as measured in our labs over a period of days with varying loads. No one instantaneous sample is necessarily within the ranges shown, but as you can see from the previous tables, averages are a fair assessment. Note that Table 01 is presented in two parts: The upper half shows behavior when using PTPv2; the lower half shows behavior using the traditional client-server protocols, NTP and DT2.
Each networked environment is different, and the results we observe in our lab or production environment may not match what you would see in your own environments, but these results show you what's both possible and reasonable in terms of Windows timing with a good third-party timekeeping program.
In summary, Windows can easily meet or exceed most high-precision timing requirements.
|PTP||less than 50 μs/sec||2-9 μs*||Rare|
|PTP||100-200 μs/sec||50 μs or better||Occasional|
Windows 2008 R2
|PTP||300-500 μs/sec||100 μs or better||Regular|
|DT2 or NTP||1 ms/sec||500 μs or better||Occasional|
Windows 2008 R2
|DT2 or NTP||100-1000 ms/sec||1-100 ms||Regular and excessive|
Table 01: Various versions of Windows and their performance
* Our unloaded virginal Windows 10 test machine, with modern hardware, using PTP with power-saving disabled, showed a one-time worst-case excursion of 7 μs over three days, with an arithmetic mean of only 2 μs synchronization during the same period. These results would be extremely rare in a production environment. The numbers shown in the table above were taken either from real production machines, or machines with simulated loads.