An increasing number of network devices such as switches, capture cards and appliances as well as network adapters offer the ability to timestamp incoming and event outgoing packets. This ability to timestamp Ethernet traffic precisely is used widely in applications such as compliance, troubleshooting, capacity planning, out-of-band networked application performance analysis and intrusion detection and prevention. Precise Ethernet timestamps are key to knowing exactly when a networked event occurred. These timestamps are delivered with the packet with which they are associated.
Though these timestamps may appear to have nanosecond resolution, when it comes to knowing what their actual resolution and absolute accuracy is, things are less clear. Most vendors quote an advertised timestamp resolution and have a pretty good idea of the accuracy of the timestamps they place on packets relative to each other. When it comes to their absolute accuracy however, the level of complexity increases; especially when taking into account the contribution of disciplining the oscillator which is generating the timestamp from an external time reference.
As an internal project, in 2016, Metamako set out to design and implement a methodology to measure the absolute accuracy of 1 Gigabit Ethernet (1GbE) timestamps on its MetaWatch product. In early 2017, Metamako adapted this methodology to measure the absolute accuracy of MetaWatch's 10GbE timestamps and teamed up with the Securities Technology and Analysis Center (STAC) to implement it as part of an independently verified, new industry standard for measuring time synchronisation accuracy: the STAC-TS (time synchronisation) suite of benchmarks.
Packet Timestamping Accuracy
The absolute accuracy of packet timestamps has two main components:
- How accurately the clock being used to apply the timestamp is synchronised to its reference
- How accurately the timestamp can be applied to each packet
Synchronising a clock to a reference clock is a complex subject in itself; however, it essentially involves the clock periodically comparing its position against the reference, and adjusting as necessary. Invariably, the oscillators will run at different rates, so the clock being synchronised will constantly have to correct itself to match the reference. To do so, it will need to see a difference between itself and the reference and apply the appropriate correction. If the reference clock is not very frequency stable, it makes it extremely difficult for the clock being synchronised to maintain its accuracy relative to the reference as it is constantly having to change its frequency to match that of the reference.
Choice of Oscillator/Clock Options
As there is a huge range of oscillator (clock) options available, varying widely in stability, the choice of oscillator is a key factor in how well a clock can synchronise to a reference. At the extremely accurate end of the scale, oscillators derive their frequency from an electronic transition frequency of an atom, such as caesium and rubidium. At the far less accurate end of the scale, we have the uncompensated crystal oscillator that derives its frequency from the mechanical resonance of a vibrating crystal of piezo-electric material, typically quartz. Between them, there are various enhancements to the crystal oscillator such as electronically compensating for temperature changes (TCXO) or keeping its temperature constant by enclosing it in an insulated micro-oven (OXCO). When synchronising clocks, the measured error over time is collected over a constantly advancing window of time known as the synchronisation time constant. The length of which will vary as a function of the relative short and long-term stabilities of the specific clocks being synchronised. A good example of this is synchronising to the GNSS satellite clocks. These clocks are extremely frequency stable when measured over long periods of time. However, mainly due to tropospheric delay - the near-earth atmosphere adding a delay to the signal - their short-term timestamp accuracy can only be measured by the receiver by up to about 100 ns. A reference clock synchronised to GNSS satellites will generally set a long enough time constant in the synchronisation algorithm between its local oscillator and the GNSS receiver to average this jitter out.
Timestamp accuracy is ultimately determined by the underlying frequency of the timestamping clock which also defines its resolution. For example, a 156.25 MHz 10GbE clock implemented in an FPGA or a 10GbE network adapter will 'tick' every 6.4 ns, so timestamps are at best accurate to this granularity or resolution. This clock can also only synchronise itself to an external reference with this level of accuracy. Timestamp accuracy is also determined by clock jitter where a sample being timestamped arrives right on a clock boundary - which will have a frequency jitter component - and may fall into the current or next time quanta. The actual clock used to timestamp also plays a significant part in the ultimate accuracy and it is conceptually simplest to timestamp using the recovered Ethernet clock. However, its quality is unknown and the IEEE 802.3ae standard allows it to vary by ±100 ppm therefore potentially adding a not-insignificant amount of timestamp jitter.
There are numerous ways to synchronise clocks. All of them involve measuring the difference between two or more clocks and attempting to reduce this difference as close to zero as possible by altering its frequency, current position or both. A precise time distribution standard in the networking industry is the use of transmitting a pulse down a 50 Ω coaxial cable every second with the middle of the rising edge of the pulse representing the start of a second. This is known as pulse per second (PPS). The sending clock generates the pulse precisely on the second and the receiving clock compares its arrival to its time and adjusts as necessary. Other standards exist also transmitting pulses and waves down coaxial cables such as 10 MHz which is widely used for frequency synchronisation. There are also clock synchronisation protocols running over Ethernet such as Network Time Protocol (NTP) and Precision Time Protocol (PTP.)
It is important to note that when PPS or any kind of one-way time dissemination is used, the propagation delay through the coaxial cable and any distribution equipment must be compensated for. This is because it takes a finite amount of time for the signal to travel down the cable so the receiving clock will accept the pulse later than it was sent. Most clocks with PPS inputs allow a compensation value to be set to account for this delay.
How to Measure Packet Timestamp Accuracy
An accuracy measurement involves comparing the quantity being measured to a reference. In the case of packet timestamp accuracy, the requirement is that an asynchronous event, the arrival of a given packet, is timestamped concurrently by a system under test (SUT) and another system, and the two generated timestamps be compared. Unless the system being compared to the SUT is also the time reference for the SUT any timestamp comparisons are relative rather than absolute (as illustrated in Figure 01). That is, only the difference between timestamps can be measured and no conclusion can be drawn as to whether the SUT or the system it is being compared to is actually accurate. To measure the absolute accuracy of a SUT, the SUT timestamp must be measured relative to the reference clock to which it is synchronised.
There is certainly more than one way to achieve this, however Metamako decided to leverage the fact that 10GbE ultimately gets delivered to the Ethernet physical layer from the transceiver as a differential signal pair. PPS is generally delivered down a coaxial cable as a pulse. It is therefore possible to compare the relative arrival of the electrical signals representing the start of an Ethernet frame to the rising edge of a PPS pulse, each in the time domain, and to calculate a timestamp with absolute accuracy (to the reference start of second pulse). The SUT timestamp is then compared to this reference timestamp which allows to determine its absolute accuracy.
To minimise time synchronisation error when measuring absolute accuracy, it is desirable to have the most stable reference clock possible. Ideally, a caesium atomic clock would be used as they provide stability in the 5 × 10-13 range (500 fs/sec jitter). Unfortunately, they are rather costly and a rubidium atomic clock is far more affordable. Rubidium clocks exhibit stability in the 1 × 10-11 range (10 ps/sec jitter). An oscillator backed by a GNSS receiver could also be used, however wall-time timestamps are not required for this measurement and depending upon implementation, PPS pulses output from GNSS receivers can exhibit short term jitter of up to 100 ns/s. A free-running rubidium atomic clock does not exhibit this level of short-term jitter and hence provides more than adequate stability to meet the precision criteria of this measurement.
Figure 01: The difference between relative and absolute timestamping
Source: Metamako (2017). Accuracy of network timestamping.
Key to these measurements was an acquisition device capable of performing very accurate simultaneous timestamping of multiple electrical signals with extremely high resolution. Given that 10GbE at Layer 1 is a 5.15625 GHz carrier signal encoded with a 64b/66b line code, identifying the reference bit at the beginning of an Ethernet frame required that the frame be decoded into 66-bit blocks and the offset of the middle of the first bit of the frame from the block boundary calculated. A device capable of both capturing the fine details of the 10GbE differential signal and decoding it while preserving each block's precise location in the time domain was required. The ideal device for this task is an oscilloscope. They are designed to acquire signals on multiple channels, are precisely synchronised in the time domain and are capable of extremely high sampling rates. They also optionally come with numerous protocol decodes including 64b/66b. Ultimately, the single shot resolution of an oscilloscope is defined by its sampling rate. As a 10GbE bit is just under 97 ps wide, the Nyqvist rate dictates that 10GbE be sampled at a rate of at least 20.625 GSa/s. An oscilloscope capable of meeting or, ideally, exceeding this sampling rate was therefore required.
By distributing the same Ethernet stream and start of second reference pulse concurrently to the SUT and oscilloscope, each device can independently reference one to the other. In the case of the SUT, the start of second reference pulse is used to correct the SUT's timestamping clock which timestamps each frame. The oscilloscope's acquisition buffer contains the precise temporal difference between the arrival of the start of second reference pulse and the arrival of the Ethernet frame (defined as the middle of the first bit immediately following the Ethernet preamble). The oscilloscope therefore provides an extremely accurate external reference with which to validate the SUT's packet timestamps' absolute accuracy; incorporating any SUT clock synchronisation error and local clock jitter on timestamps.
When an SUT is being synchronised once per second, it will typically make clock corrections immediately after receiving each synchronisation event. Given oscilloscope acquisition buffers fill up in the order of milliseconds (or less) at high sample rates, the temptation becomes to trigger acquisition on the PPS pulse and capture as many Ethernet frames as will fit into the acquisition buffer; typically frames arriving no more than 1 ms before the pulse and 1 ms after. Though representative of the instantaneous accuracy of the SUT's timestamping ability, it tells us very little about the consistency of the SUT's timestamping accuracy between PPS pulses. It is therefore important to structure the test so that packets can be timestamped throughout the second, ideally for multiple seconds. One way of doing this is to generate a higher frequency pulse, frequency-locked to the time reference which can be used to trigger the oscilloscope to acquire in sequence mode. In sequence mode, the size of the acquisition buffer is still unchanged and will only hold a few milliseconds of packets at 10 Gbps, however it is divided into segments that allow periodic acquisition throughout the second.
An Introduction to MetaWatch
MetaWatch, combined with a suitable K-series Metamako device, is an application designed to capture, timestamp, buffer and aggregate up to 30 1/10GbE ports. MetaWatch offers a combination of features in the aggregation tap/packet broker space:
- Integrated 10GbE Layer 1 matrix switch offering port mirroring and pass-through adding the same latency as 1 m of fibre
- Full Ethernet per-port statistics
- 2 x 15:1 1/10GbE buffered aggregation via a 8 GB or 32 GB buffer
- 1 ns timestamp resolution capturing Ethernet packets
- Time synchronisation support for PTP and NTP, optionally coupled with PPS
- Support for IEEE 802.3x PAUSE frames on the aggregated egress ports allowing consuming devices to leverage the deep buffers to moderate incoming packet rate
MetaWatch supports NTP, PTP and PPS to synchronise to a timing reference. In theory, each can provide sub-nanosecond accuracy. In practice, however, currently 1 PPS provides the greatest accuracy. MetaWatch uses the device's local oscillator (VCXO) by default. Both OXCO and atomic (rubidium) clock modules are optionally available providing more stability, particularly over longer holdover periods (periods where the reference clock is unavailable, most often due to unavailability of the reference clock or a path to the reference clock).
|Pulse reference||SRS rubidium frequency standard with|
|TimeTech Pulse distribution unit (PDU)||1U 1:16 PPS with minimal skew and jitter|
|Keysight Waveform generator||Stable 1,000 Hz pulse frequency locked to 10 MHz from pulse reference|
|LeCroy Oscilloscope||36 GHz, 80 GSa/s, 256 MS buffer with 64b/66b serial decode, time base frequency-locked to 10 MHz from pulse reference|
|Packet source/capture||Simultaneously deliver and capture consistent line-rate packets without loss|
|Adjustable Ethernet clock source||Move packet source Ethernet clock frequency ±100 ppm from reference|
|Optical 10GbE splitter||Send the same 10GbE stream to both MetaWatch and oscilloscope with minimal skew|
|SFP+ to SMA breakout board||Convert 10GBASE-SR SFP+ transceiver output to 2 × SMA 50 Ω coax|
|10GBASE-SR transceivers||Feed the optical splitter from the packet source and deliver the Ethernet stream to MetaWatch and the breakout board|
Table 01: Components and key functionality of the test harness
The Benchmark Methodology - a summary
The purpose of this methodology was to characterise the accuracy of MetaWatch 10GbE 1 ns-resolution timestamps running on a MetaApp 32. MetaApp 32 is a 32-port device comprising Layer 1+ switching and an FPGA with the optional atomic clock module, synchronised via PPS. MetaWatch on a MetaApp 32 was therefore the SUT.
Inputs provided to MetaWatch were:
- A 10GbE SFP+ transceiver delivering Ethernet packets containing 64-byte Ethernet frames at line rate each containing a 32-bit sequence number plugged into a single port on the MetaApp 32
- A PPS reference pulse delivered down a 50 Ω coaxial cable
MetaWatch outputs a stream of Ethernet packets on a 10GbE output port with Metamako's industry standard trailer containing MetaWatch's nanosecond-precise timestamp applied to each Ethernet packet.
To measure the absolute accuracy of MetaWatch's 10GbE timestamps, a test harness was assembled, designed to meet the aforementioned criteria. The contents of the test harness is listed in Table 01.
Figure 02: PPS reference pulse to SUT and oscilloscope (also showing 1,000 Hz to oscilloscope and 10 MHz to oscilloscope and waveform generator timebase inputs)
Source: Metamako (2017). Accuracy of network timestamping.
As Figure 02 illustrates the pulse reference fed PPS to the pulse distribution unit which re-amplified the incoming pulse and sent it out on two ports with extremely low port-to-port skew and jitter (measured at 2 ps with the oscilloscope). It also provided a matched and isolated 50 Ω impedance to both MetaWatch and oscilloscope. To make the most of the acquisition buffer in the oscilloscope, a waveform generator was used to generate a frequency-stable 1,000 Hz pulse from the pulse reference's 10 Mhz output. This 1,000 Hz pulse would trigger the oscilloscope to acquire 1,000 acquisition buffer segments every second allowing accuracies to be measured over multiple seconds rather than over a few milliseconds due to the 80 GSa/s sampling rate of the oscilloscope.
A two-way optical splitter was used to send a stream of Ethernet packets containing 64-byte Ethernet frames at line rate to MetaWatch and oscilloscope (via the SFP+ to SMA breakout board) from the packet source.
Three sets of test runs were completed. The first set had the packet source connected directly to the optical splitter. For the second two sets, requiring the packet source's Ethernet transmit clock to be run at the extremes of allowable Ethernet clock tolerances, an identical MetaApp 32 running MetaMux (an application offering ultra-low-latency packet multiplexing and aggregation) was added between the packet source and the optical splitter. In general, only network test equipment allows user control over the frequency of oscillators supplying Ethernet interface clocks. The packet source was no exception with its fixed-frequency oscillator supplying its Ethernet interface clocks. Given the MetaApp 32 is a Metamako product, Metamako has complete control of the oscillator providing the Ethernet transmit clock, thus allowing its frequency to be adjusted extremely finely.
It was first adjusted to +100 ppm and then -100 ppm from the Ethernet 5.15625 GHz base clock frequency as measured against the pulse reference. For the test runs at +100 ppm, a second 'top up' Ethernet stream was alsosent from the packet source into MetaMux to ensure that line rate was maintained as with the 100 ppm increase in Ethernet clock TX frequency. With the Ethernet transmit clock running at this frequency extreme, receiving from a single Ethernet source, MetaMux would not have been able to send packets out at line rate.
The Complete Test Harness Calibration
Figure 03 shows the complete test harness:
- The pulse reference distributes pulses concurrently to both the SUT and the oscilloscope.
- The packet source/capture distributes ethernet packets concurrently to both the SUT and the oscilloscope and captures the packets after they have been timestamped by the SUT.
As the oscilloscope was capable of measuring acquisition samples to ±12.5 ps, it was possible to measure and account for any skew between Ethernet frame arrival and PPS reference pulse arrival at MetaWatch and the oscilloscope extremely precisely. Calibrating the relative arrival of the PPS from the outputs from the PPS distribution unit was fairly straightforward as it involved comparing them in the time domain on different oscilloscope channels.
Figure 03: Complete test harness and system under test (MetaMux removed for clarity)
Source: Metamako (2017). Accuracy of network timestamping.
Calibrating PPS Reference Pulse Distribution
Both PPS reference pulses from the pulse distribution unit were connected to different channels of the oscilloscope. Over 1,000 pulses (seconds), the relative difference in arrival time between pulses was measured and noted.
Calibrating Ethernet Packet Distribution
Packet distribution from the packet source required a physical media converter for the oscilloscope as the standard for Ethernet used by MetaWatch was the SFP+ cage. Therefore, an SFP+ to SMA breakout board was used. It was connected to the oscilloscope via a pair of 1 m coaxial test cables. The breakout board was a passive device connecting the SFP+ differential receive (RX) pins from the 10GBASE-SR SFP+ transceiver to the coaxial test cables. The combination of the breakout board and the coaxial test cables therefore added a fixed and measurable propagation delay between the 10GBASE-SR SFP+ transceiver and oscilloscope. This propagation delay was measured and accounted for as follows with the help of an SFP+ loopback transceiver:
The final step was to measure and potentially correct for any skew in the optical splitter, the fibre patch cables and the 10GBASE-SR transceivers.
The Mechanics of the Test Runs
The goal was to compare oscilloscope versus SUT timestamps, both referenced to the PPS reference pulse from the pulse reference, intra-second and inter-second. The purpose of the 1,000 Hz pulse source was to trigger oscilloscope capture 1,000 times during each second across multiple seconds. It was determined that each run last three seconds. This duration was dictated by a combination of the acquisition buffer size of the oscilloscope and the time taken to copy the 64b/66b decoded result set from the oscilloscope for analysis. Essentially, 3,000 acquisition segments, each containing 6 to 7 Ethernet packets, were captured and correlated during each test run.
With the test harness calibrated, taking into account the propagation delays through each relevant component of the test harness, each test run was comprised of the following:
- Starting 10GbE packet capture
- Starting the packet source, causing it to produce Ethernet packets containing 64-byte Ethernet frames at line rate, each containing a 32-bit unique sequence number
- Immediately starting acquisition on the oscilloscope causing it to capture a segment containing 40 KSa on the rising edge of each 1,000 Hz pulse
- Stopping capture once the three seconds had elapsed
The oscilloscope captured the following channels during the run:
- the PPS reference pulses
- the 1,000 Hz pulses
- the differential 10GBASE-R Ethernet pair
Packet capture cought every resulting Ethernet frame from MetaWatch.
64b/66b Decode Packet Reassembly
A custom extraction script was written that was executed on the oscilloscope that dumped the contents of the 64b/66b decode table with picosecond acquisition timestamps for each 66b-block before:
- Extracting the precise oscilloscope acquisition time offset for each Ethernet frame.
- Validating the Ethernet frame check sequence (FCS) of each Ethernet packet. If incorrect, the Ethernet packet was discarded to avoid the possibility of a corrupted sequence number generating a false positive match.
- The sequence number in the Ethernet packet payload was extracted, coupled with the start-of-frame acquisition time offset and written to a file for post analysis.
Following the test run, an analysis script was run on all the Ethernet frames captured from the MetaWatch which extracted each sequence number from the Ethernet packets' payload and its associated nanosecond-resolution timestamps from the packet trailer added by MetaWatch into a file.
Correlating Oscilloscope Capture and Packet Capture
The previous two steps generated files containing pairs of timestamps and sequence numbers. The timestamps in the file from the packet capture device came from the MetaWatch with nanosecond resolution. The timestamps in the file from the oscilloscope were relative to the start of oscilloscope acquisition, that means the first 1,000 PPS pulse triggering acquisition of the first segment. To correlate them, the precise offset from the start of oscilloscope acquisition to the nearest PPS reference pulse was required. This offset would then be added to each oscilloscope start of Ethernet frame timestamp to align it in time with the nanosecond portion of the MetaWatch timestamp. It was obtained by having the oscilloscope measure the relative time between each PPS reference pulse and the 1,000 Hz pulse that triggered the segment containing it.
The final step involved comparing each Ethernet packet acquired by the oscilloscope's PPS reference pulse correlated timestamp to the corresponding MetaWatch Ethernet frame's timestamp. This was done by matching their sequence numbers.
Absolute Accuracy Results
As stated previously, three groups of three runs were performed:
- Three runs with the packet source's Ethernet clock frequency unmodified
- Three runs via a MetaApp 32 running MetaMux with its Ethernet clock frequency increased by 100 ppm (as measured against the rubidium frequency reference)
- Three runs via a MetaApp 32 running MetaMux with its Ethernet clock frequency reduced by 100 ppm (as measured against the rubidium frequency reference)
In each run, the oscilloscope acquired a segment containing 6 to 7 Ethernet packets every millisecond for three seconds resulting in just over 19,000 Ethernet packets acquired per run.
In analysing the results of these runs, no statistically valid differences were observed across the groups of runs confirming that MetaWatch's ability to timestamp is not influenced by the sender's Ethernet clock frequency.
Plotting 50 ps bins of the deltas between the MetaWatch timestamps (ns) and the oscilloscope timestamps (ps) across all nine runs (172,500 oscilloscope-acquired Ethernet packets) yielded the detailed distribution in Figure 04.
Summarised in numbers:
Min -3 ns
Mean -0.345 ns
Max 2 ns
50 ps bins were chosen in calculating the latency distribution because this represents the measurement error of ±25 ps obtained from the test harness calibration.
MetaWatch 0.5.2 running on MOS-0.14.0alpha3 on a MetaApp 32 (A5A) with the atomic clock module option, when disciplined from a PPS source, achieved:
- 1 ns timestamp resolution
- Absolute timestamp accuracy of -3 ns/+2 ns for the specified input port
- 42.4% of timestamps accurate to 1 ns
- 92.3% of timestamps accurate to ±1 ns
Figure 04: Cumulative plot of all nine runs (50 ps bins)
Source: Metamako (2017). Accuracy of network timestamping.
Metamako set out to design and implement a test methodology to measure the absolute accuracy of the MetaWatch 10GbE capture and timestamping application on the MetaApp 32 device to within double-digit picosecond precision. To achieve this goal, Metamako proposed test specifications to the STAC Benchmark Council, which made key enhancements and formalised the specifications as part of its STAC-TS suite.
Metamako built a test harness based on the final specifications, then conducted calibration and testing. STAC confirmed that the uncertainty in the measurement of the results was 50 picoseconds, thus achieving Metamako's accuracy goal. They also confirmed that for the capture port tested, 100% of samples were between -3 ns and +2 ns of the time reference and 92.3% of timestamps were accurate to ±1 ns.
From Metamako's perspective, these are excellent results that more than justify the hard work put into designing the MetaApp 32 hardware and the MetaWatch software. Clients can leverage MetaWatch to timestamp their packets using this solution and be confident that the timestamps are accurate.
The benchmarks for timestamp accuracy containing these and other results are available for download. www.metamako.com
Knight, M. (2017) How to time-synchronise multiple devices running MetaWatch. Metamako Blog. Retrieved from http://blog.metamako.com/how-to-time-synchronise-multiple-devices-running-metawatch
Knight, M. (2017). 5 questions to ask before buying a Layer 1 switch. Metamako Blog. Retrieved from http://blog.metamako.com/5-questions-when-buying-layer-1-switch
Knight, M. (2017). Network Traffic Capture & Aggregation: Why buffer size is crucial. Metamako Blog. Retrieved from http://blog.metamako.com/network-traffic-capture-aggregation-why-buffer-size-crucial
Metamako (2017) MetaWatch: Capture Everything. Metamako.com. Retrieved 30 November 2017, from https://www.metamako.com/applications/metawatch-app.html Metamako (2017). Layer 1 switches. Retrieved from https://cdn2.hubspot.net/hubfs/1986646/LaymansGuide_Layer1%20Final.pdf
Riley, W. & Howe, D. (2017). Handbook of Frequency Stability Analysis. National Institute of Standards and Technology. Retrieved from http://ws680.nist.gov/publication/get_pdf.cfm?pub_id=50505
Metamako (2017). Accuracy of network timestamping. London: Securities Technology Analysis Center. Retrieved from https://cdn2.hubspot.net/hubfs/1986646/Guides/STAC-TS%20report%20summary%20Final.pdf