The Gateway to Algorithmic and Automated Trading

Perils of PTP Monitoring

Published in Automated Trader Magazine

Regulations such as MiFID 2 are helping boost demand for precision timestamping. Though Precision Time Protocol (PTP) offers microsecond accuracy - far higher than the competitor Network Time Protocol - ensuring synchronisation across networked devices is non-trivial. This article highlights the pitfalls in PTP and offers some solutions.

AUTHOR'S BIO

Jeffry Dwight is the founder and CEO of Greyware Automation Products, Inc., located in Murphy, TX. He remains the lead programmer for all of Greyware's products, including Domain Time II, the premier timekeeping software for Windows. In his spare time, he writes books and plays a variety of musical instruments.

Many organisations, particularly in the securities industry, are turning to Precision Time Protocol (PTP) for its precision and accuracy. A well-designed PTP network can keep all of the nodes within a handful of microseconds of each other, even without special hardware. With special switches and with special Network Interface Controllers (NICs) - or kernel support for socket timestamping - a network may achieve the sub-microsecond range. Increasingly tight timing requirements on regulated industries make this level of performance mandatory. Not only are firms asked to have good synchronisation, they are asked to be able to prove it, and this is where PTP falls flat on its face.

Monitoring, auditing and real-time alerting tasks range from difficult to impossible, which seems surprising until one examines the protocol in detail. PTP is a 'time distribution scheme', with a hierarchy of nodes, each responsible for following the best time source available. The best time source is called the 'master'; its followers are called 'slaves'. This is a one-way communication path: Masters provide information, slaves consume it. Proving the time on a master is fairly easy, but proving the time on a slave is not.

  • PTP excels at distributing the time with high precision and accuracy.
  • PTP cannot demonstrate that its nodes are synchronised.
  • PTP's controlling standard lies at the heart of most monitoring problems.

It is useful to keep in mind the distinction between 'in-band techniques' (using the protocol itself) and 'out-of-band techniques' (using any other method) to measure compliance. Those transitioning from Network Time Protocol (NTP) or similar protocols are not used to worrying about the distinction, because a standard implementation of NTP is both a consumer and a producer of time information, thus allowing in-band monitoring even of machines that do not ordinarily provide the time to others.

PTP's in-band monitoring abilities are severely handicapped by the protocol specification itself. In particular, we will show why PTP messages cannot be used in a manner analogous to NTP's messages. PTP's design militates against the collection of compliance data. In order to show this, we must first review the protocol itself, including tedious detail about its messages and how they are delivered. If you are already familiar with how PTP operates and the kind of messages it uses, you may skip ahead to the section 'Node Identification'.

Background

Precision Time Protocol version 2, commonly called 'PTP' (or PTPv2 when necessary to distinguish from other versions), is controlled by IEEE under the standard 1588-2008. PTPv1 (1588-2002) is not obsolete, but is not directly interoperable with v2 and has largely been abandoned. PTPv3 is still in the planning stage and does not appear poised to solve any of v2's inherent problems.

The present article is limited to PTPv2. Table numbers, section numbers or other references in this article pertain to labels as given in IEEE 1588-2008. From here on, these references are labelled with the prefix 'IEEE'. We provide these call-outs to the standard in order to avoid having to recapitulate every detail and to provide documentation of where the standard itself hinders monitoring. You do not need to have a copy at hand in order to follow this article.

Here we only address PTP communications using UDP at the transport layer of the Open Systems Interconnection (OSI) model, using IPv4 or IPv6 at the network layer. IEEE 1588-2008 can operate directly at the data link layer (normally 802.3). It can also use alternatives such as DeviceNet, ControlNet and PROFINET, but these options are outside the scope of this article.

Popular software implementations of PTP include Greyware's Domain Time for Windows, the SourceForge PTPd project (and its many derivatives, some proprietary, some open source) for Linux, TimeKeeper from FSMLabs, clients from Meinberg and dozens of others. Most are 1588-2008 compliant with respect to the defaults, but many have either proprietary extensions or idiosyncratic interpretations of the murkier aspects of the specification.

Glossary of terms

ARB arbitrary
BMC Best Master Clock
CDMA Code Division Multiple Access
CIDR Classless Inter-Domain Routing
CPU Contral Processing Unit
DHCP Dynamic Host Configuration Protocol
GNSS Global Navigation Satellite System
GPS Global Positioning System
IEEE Institute of Electrical and Electronics Engineers
IP Internet Protocol
MPD Mean Path Delay
NAK not acknowledged
NIC Network Interface Card
NTP Network Time Protocol
OSI Open Systems Interconnection
PHY Physical layer (e.g. wire or glass)
PTP Precision Time Protocol
RFC Request for Comments
TAI International Atomic Time
TCP Transmission Control Protocol
TLV Type-length-value
ToD Time-of-day
TSC Time Stamp Counter
UDP User Datagram Protocol
UTC Coordinated Universal Time

General Operation

PTP networks consist of nodes, of which only one is master and the rest are slaves, passive observers or specialised hardware devices for segmenting and distributing the master's time. A reference implementation using either of the default profiles (discussed later) requires all nodes to be potential masters and to use a very specific 'best master clock' (BMC) algorithm for determining which node should be master. When the current master goes offline or downgrades its quality, perhaps due to loss of GPS signal or other fault, then the other nodes will quickly hold an election to determine which of the remaining nodes has the best quality and switch allegiance.

An appliance billing itself as a 'grandmaster' is simply a PTP node with access to a primary reference time, such as a Global Navigation Satellite System (GNSS) like Global Positioning System (GPS), and is configured in master-only mode. Grandmasters participate in the BMC, but instead of slaving to a better time source, they retreat to passive mode, ready to step in as master when needed. In practice, a robust PTP network consists of an appliance with excellent quality, a backup appliance and all other nodes configured as slave-only, but it is important to understand that PTP does not require this configuration.

The normal operation of a PTP network is not, by itself, a hindrance to monitoring. It can, however, be a problem for industries that require traceability to UTC, because the BMC algorithm does not require the selected master to have a primary time source. Advertisements of clock quality by potential masters are taken at face value and PTP is happy to let a network of nodes without any external reference auto-configure to follow the best claim, even if the selected master is wrong by seconds, minutes or days. Only monitoring can prove that a PTP network is operating correctly, within tolerance, and also traceable to UTC.

In the absence of an appliance, a software-based master can work to syntonise and synchronise the network, but will be limited to the accuracy and precision of its own time source, and the quality of the PTP timestamps will be only as good as its ability to syntonise itself with the source. If the source itself is an atomic clock or GNSS/GPS-connected unit, the distributed time will probably be within approximately half a millisecond of UTC. If the source is secondary or malfunctioning, the quality is unpredictable and traceability goes out the window.

Except for expense, this does not seem like a problem at first blush. You get an appliance or two, configure your other nodes and monitor the whole thing, right? Why is this complicated? To understand, we have to look at the individual messages PTP uses. We will call out the shortcomings of each as we examine them.

Announce Messages

A master node sends periodic Announce multicast messages, advertising its status and availability. The default interval is one Announce every two seconds.

Slaves use overheard Announces to exercise the BMC algorithm. Potential masters must also monitor the Announces to determine if they should remain slaves, remain in passive mode or begin sending out competing Announces. If an entire network of nodes is restarted, there may be many masters advertising. Within a very short period of time, though, the BMC algorithm will eliminate all but one of the masters. If the current master stops sending Announces or its Announces indicate a change in quality, all nodes re-exercise the BMC algorithm as if the entire network had just been restarted.

Announces are important not only for ensuring that the best master is selected, but for informing all other nodes, including monitoring programs, of the network's time quality. Announces include the master's time source, how many steps (hops) away it is from a primary source, its TAI-UTC offset, its timescale (epoch), upcoming leap seconds and the quality of its clock (accuracy, precision and estimated/measured syntonisation frequency with respect to its source).

PTP supports only two timescales: 'PTP' or 'ARB'. The PTP timescale, with an epoch of 1 January 1970, 00:00:00 TAI, has a known TAI-UTC offset (IEEE Sections 7.2.2 and 7.2.3). TAI is monotonic and continuous; it does not know or care about leap seconds. A master serving the PTP timescale is required to provide timestamps in TAI,and also the current TAI-UTC offset (so slaves can derive UTC).

The ARB timescale, as suggested by the name, is entirely arbitrary. It may use any epoch at all, need not serve TAI timestamps, may or may not know the TAI-UTC offset and may not be monotonic or continuous. In practice, a master advertising an arbitrary timescale normally sends UTC timestamps with a TAI-UTC offset of zero, but this is simply convention, not a requirement.

Summary : Slaves do not send Announce messages, so the information we have available for monitoring a master's time quality is entirely missing for slaves.

Sync and Follow-Up Messages

A master node sends Sync multicast messages. A Sync message contains a timestamp that slaves can use for synchronisation. The default interval for Syncs is one per second.

If the master node is a one-step clock, the timestamp in the Sync message may be used directly. One-step clocks typically have special hardware that rewrites the packet timestamps at the moment the Sync message makes it to the PHY level, thus eliminating stack jitter and delays from Ethernet collisions. A software-based master may be one-step because it has no reliable method to measure the interval between when it issued the packet and when that packet actually transited the Network Interface Card (NIC).

Two-step clocks have some method to determine internal jitter, either using special hardware or using socket timestamping. These masters send a Follow-Up message immediately after each Sync, with a more precise timestamp that indicates what the Sync would have contained had it been able to do one-step Syncs. Slaves to a two-step master must combine information from each Sync and its Follow-Up in order to derive timestamps useful for synchronisation.

Summary : Slaves do not send Sync messages, so the time at the slave cannot be monitored by observation.

Delay Measurement Messages

A PTP slave needs to know the network's propagation delay - that is how long ago the master sent a timestamp. On a regular schedule - by default once every two seconds - each slave sends a Delay Request, to which the master responds with a Delay Response. There are two types of Delay Request-Response transactions, End-to-End and Peer-to-Peer, but the distinction does not matter for this discussion.

The slave uses the timestamps it took upon departure of the request and the receipt of the reply, along with the timestamps contained in the packets themselves, to calculate the propagation delay. It uses this information in conjunction with the timestamps from the Syncs to know its delta from the master. Most implementations collect the delay measurements over time to calculate the meanPathDelay (MPD). The MPD is used, rather than any instantaneous measurement, to correct for latency in the Sync messages.

Summary : Slaves do not respond to Delay Requests, so the propagation delay for any particular slave node cannot be measured. Slaves do not report their internal timestamps and are free to use any epoch and scale they want; even the contents of the delay measurement messages are implementation-dependent. Overheard delay measurement transactions cannot be used to know a slave's time or latency.

Management Messages

1588-2008 defines several dozen Management messages (IEEE Table 40), which can be used to query nodes for their internal states. Fortunately, using these messages can provide some of the missing slave information. Unfortunately, Management message handling is entirely optional (IEEE Section 15) and the Management messages themselves are organised more for configuration options than for node monitoring. We will look at specific Management messages in some detail later, but first we must examine how all messages are addressed and delivered, because these aspects of the standard affect the overall ability to monitor.

Management messages can help monitor nodes but have significant limitations and problems for proving the time on slaves (see section 'Management Message Hurdles').

Node Identification

Every message sent by every node on a PTP network includes the sender's identity in the message header. Most messages do not include a target; PTP was originally conceived as a multicast-only protocol where all nodes receive and process each message seen.

PTP nodes each have a clockIdentity (64 bits, normally built from the MAC address) and a portNumber (16 bits, 1-65534 allowed). The combination of clockIdentity and portNumber yields a 'portIdentity'. This portIdentity is required to be unique within the reach of the node's network. In addition, each node must have a domainNumber (8 bits, 0-127 allowed). The domainNumber is used to distinguish logically separate networks that travel over the same signaling path. 1588-2008 requires a node to ignore messages from its own portIdentity (IEEE Section 9.5.2.2) and to ignore messages with a domainNumber other than its own (IEEE Section 9.5.1). The proposed Enterprise Profile for PTP violates 1588-2008 by suggesting that slave nodes track masters in multiple domains and choose from among them. 1588-2008 itself contradicts the prohibition in special cases (IEEE Sections 11.4.3 and 15.4.1.1).

Each outgoing message from any PTP node, operating in any mode, contains its own domainNumber and portIdentity in the packet's header. This allows receiving nodes to ignore messages from different domains and to obtain a unique identifier for distinguishing other nodes based on messages overheard. The IP address of a sending node is easily discerned by the receiver, but PTP provides no mapping of IP addresses to portIdentities or vice versa.

Summary : The addressing scheme used by PTP is designed for time distribution only. Multicast messages cannot be sent to a specific node's IP address. Unicast messages may be sent to a specific IP address, but the sender must already know the target's portIdentity as well as its IP address, and the recipient must be willing and able to reply.

Protocol Limitations

Transferring time-of-day (ToD) information from a master to its slaves is the primary goal of PTP. The standard defines strict algorithms for how a slave must use the Announces, Syncs and delay measurements to determine its offset, but the steering mechanism is entirely outside the scope of the specification. In fact, a node can be fully compliant with 1588-2008 without implementing a steering mechanism at all. A monitoring node, for example, may be very interested in knowing the ToD at the master but not allowed to manage its own clock. A slave node may choose not to implement delay measurement. The assumptions behind PTP are simple: The best master clock on the network assumes responsibility for sharing its time with other nodes and other nodes may do whatever they want with the information, including nothing at all.

Nodes processing a master's Announce and Sync messages must take note of the timescale and the TAI-UTC offset advertised in order to determine ToD at the master. They are not, however, required to use either the PTP timescale or UTC internally. A non-master's TAI-UTC offset is entirely its own affair and may be meaningless.

Summary : PTP only provides a mechanism for slaves to determine the time of day at a master, but not for a monitor to determine time of day at a slave. Even if the slave's internal variables were revealed, they may be idiosyncratic and meaningless to a monitor.

Protocol Message Addressing

Master-to-slave communication normally consists of unsolicited multicast Announce and Sync messages, plus responses to delay measurement requests. Announces, Syncs and delay measurement requests are not addressed to any particular node. Only a master may respond to a delay measurement request, and delay measurement responses contain a target portIdentity which is copied from the header of the delay measurement request. This extra data allows the multitude of slave nodes to distinguish which delay measurement response is meant for them. Recall that when using only multicast, each node sees every message sent by every other node, whether it's relevant or not.

Slave-to-master communication normally consists only of delay measurement requests. There are two default profiles, End-to-End [001B19000100] and Peer-to-Peer [001B19000200] specified in 1588-2008. These profiles require slaves and masters to use multicast for delay measurement. Sending Announces and Syncs by multicast is efficient, since the master need only place one outgoing message on the wire and each slave needs to see the information anyway. For delay measurement traffic, however, doing everything by multicast creates a significant amount of wasted network traffic, because only the master needs to see the request, and only the slave who sent the request needs to see the response, but all other nodes must receive, process and discard the traffic.

1588-2008 addresses this problem with the concept of unicast leases (IEEE Section 16.1). The reference implementation of unicast leasing is the Telecom Profile [0019A7000100]. A telecom slave must negotiate unicast messaging with a telecom master on a message-type by message-type basis. Announces, Syncs and delay measurement must be negotiated separately, and the slave must be programmed with the master's IP address and portIdentity using some out-of-band method, such as a configuration file. A unicast lease operates much like a Dynamic Host Configuration Protocol (DHCP) lease, in that it has an expiry period and must be renegotiated periodically. Once a lease is established, the master will send the leased messages to the leasing node by unicast until the lease expires. This requires the master to keep a table of leases and to send one copy of each relevant message to each lessee. The size of the lease table is implementation-dependent, and unicast negotiation is entirely optional. A telecom node will not send or respond to multicasts at all, and is therefore not discoverable by any PTP message. Because the lease negotiation is complex, and the master's lease table size cannot be known (potentially leaving slaves unable to obtain a lease), the Telecom Profile is typically used only in specialised environments where multicast is forbidden anyway.

In practice and outside 1588-2008's specifications for the default profiles, almost all masters capable of receiving multicast User Datagram Protocol (UDP) messages are also able to receive unicast UDP messages and in fact need not distinguish between the two. Masters that do distinguish, however, may choose to reply to unicast delay measurement requests with unicast delay measurement responses. Domain Time, in auto-detect mode, will send both unicast and multicast delay measurement requests to the selected master; if the master responds to the unicast request with a unicast response, Domain Time will thereafter use only unicast. PTPd has a similar setting, called 'hybrid mode', but does not test to determine if unicast responses are supported. If set to hybrid mode, a PTPd node will fail unless the master also supports hybrid mode. Note that it is perfectly legal for a master to 1) ignore unicast requests; 2) respond to a unicast request with a unicast response; or 3) respond to a unicast request with a multicast response.

Hybrid mode, when supported by both slave and master, significantly reduces network traffic. Only Announces and Syncs need to be multicast. The proposed Enterprise Profile for PTP (no profile identifier number yet specified), requires hybrid mode, mostly in recognition of the problem and its practical solution, but falls outside 1588-2008's allowed operating procedures.

Summary : Multicast-only produces a significant amount of unnecessary traffic. Hybrid mode reduces traffic, but presents special problems for monitoring, as explained in the following sections.

Protocol-Related Unicast Ambiguity

A significant problem with hybrid mode is that there is no controlling specification documenting how it should work. The industry standard for most network protocols, whether UDP like NTP or TCP like HTTP, is for the server side of the connection to maintain a fixed listening target port. The requestor side of the connection creates a socket using an ephemeral source port and sends its messages to the server's well-known target port. The server sends its reply to the source IP and source port of the request. (The source port of the reply may be ephemeral as well and is not usually relevant to the transaction.) This method of replying to the source IP and source port of the request allows the server to handle many simultaneous requests and also allows the requestor to use non-privileged ports. Additionally, it permits the requestor to use a different source port for each request, thereby allowing multiple requests to be outstanding at any given time, and providing a simple means to determine whether or not the server replies within a sensible timeframe.

The well-known target port for PTP messages is either 319 or 320 (depending on message type). Nodes wanting to send messages must use the proper target port; however, they cannot depend on being able to use an ephemeral source port. A strict interpretation of 1588-2008 suggests, but does not actually specify, that all replies must also be addressed to either 319 or 320, ignoring the source port of the request. Some hardware devices and some software implementations, do indeed operate this way, presumably in order to comply with 1588-2008 rather than common sense. Therefore, a requesting node must be prepared to receive replies on the ephemeral port used for the request or on port 320, with no way to predict which will occur.

When considering unicast an additional problem emerges, as detailed in the sidebar 'Unicast Binding'. It is impossible, by definition, to run a PTP monitoring program on any machine with a node using the Telecom Profile or one of the standard profiles in hybrid mode, since the monitoring program may intercept the node's incoming unicasts and vice versa. Even if this were acceptable, a monitoring program cannot operate in an environment using the Telecom Profile unless the IP address and portIdentity of each node to be monitored were configured manually; both the target IP and target portIdentity must be known in advance. (Recall that nodes using the Telecom profile are forbidden to use multicast.)

Summary : Monitoring is only practical on a network where multicast is allowed and where nodes may be relied upon to reply to management requests. Further, if unicast is used to reduce traffic, an out-of-band proprietary mechanism is required to allow all processes to see unicast replies.

Unicast Binding

On a given computer, only one process may 'own' a given listening port. That is, the combination of listening IP address, listening port and protocol must be unique on the system for normal UDP or TCP operations. If another process binds to the same tuple and the first process has not set the exclusive flag, the new process will obtain ownership. (If the first process does set the exclusive flag, then the subsequent process will not be able to bind to the same tuple.) Incoming UDP unicasts or TCP connection attempts will only be seen by the owning process, with no notification to the prior owner that ownership has been lost.
Multicast listeners are an exception to this rule. Multiple processes may bind non-exclusively to an IP/port and join the multicast groups of interest. Incoming UDP multicasts will be seen by all sockets bound to that tuple and subscribed to that group.

Consider a node using unicast for either delay measurement or Management messages. If it uses an ephemeral source port to send messages, it cannot depend on receiving the reply on that ephemeral port. It must therefore also bind to ports 319 and 320. Since a multicast reply is as likely as a unicast reply, the requestor must also join the appropriate multicast groups on 319 and 320. If it uses port 319 or 320 as its source port, the reply will come to port 319 or 320, but may be either unicast or multicast.

Two processes on the same computer, such as a PTP client and a monitoring program, may use multicast for all messages (and even see each other if multicast loopback is enabled), but will quickly run into problems if they try to use unicast. If the PTP client is using hybrid mode for measurement delay, it must have bound ports 319 and 320 either exclusively or non-exclusively. If the bind is exclusive, the monitor program cannot run at all. If the bind is non-exclusive, the delivery of a unicast response is arbitrary, but is guaranteed to be sent to only one of the listening processes. This means that a request can be sent by the client, but the reply only seen by the monitor or vice versa. The behaviour for incoming unicasts addressed to a particular target port, where more than one socket is bound non-exclusively, is undefined by TCP/IP. In practice, both Windows and Linux will usually deliver the packet to whichever process bound the port most recently, but even this behavior is not reliable.

If PTP required unicast replies to be sent back to the requestor's source port, the problem would disappear, since each program would bind to a separate ephemeral port for its requests and each reply would go only to the requesting program. 1588-2008 does not require this and an overly-strict interpretation suggests that replying to the source port is forbidden, except for replies to 'a manager', which is not defined by the specification. Management messages are general messages and 1588-2008 D.2 says both "The UDP destination port of a multicast general message shall be 320", and "The UDP destination port of a unicast general message that is addressed to a manager shall be the UDP source port value of the PTP message to which this is a response". Some interpret 'a manager' to be any node sending a unicast management request; others may insist 'a manager' is a node whose clockType includes the MANAGEMENT_NODE bit set; the remainder either do not distinguish or just ignore the conflict. In practice, since each manufacturer is free to resolve the ambiguity in its own way, the problem is insuperable. Replies to multicast management requests will come back to port 320 by multicast; replies to unicast Management messages will come back to either the source port by unicast, port 320 by unicast or port 320 by multicast. Section D.1, however, notes that when using unicast across transparent clocks, the transparent clock itself may change the target port of a forwarded reply, leading to corruption of the mechanism.

Node Identification Problems for Monitoring Programs

IEEE Section 7.5.2.2.1 specifies how clockIdentities are to be derived, but oddly omits the question of persistence. If MAC-based clockIdentities are used, but the node has multiple interfaces, the selection of which MAC to use is arbitrary. For an appliance-type node the MAC is unlikely to change and the manufacturer 'bakes in' the proper clockIdentity. But software-based nodes do not have the same assurance. Since operating systems are allowed to add, remove or reorder interfaces, NICs can be swapped out or virtual machine MACs may change on each startup, the MAC - and therefore the clockIdentity - may change. Usually this change happens only during boot and does not really affect slave nodes, since their only requirement is to keep the same clockIdentity while running. However, most modern operating systems allow interfaces to be changed or reconfigured without rebooting.

Summary : A node's portIdentity can only be used for identification during point-in-time snapshots. There is no guarantee that a node with portIdentity X from snapshot A corresponds to the same node at snapshot B.

IP Addressing Problems for Monitoring Programs

Hardware devices are likely configured with a fixed IP address, but many software nodes may use DHCP and therefore may have a different IP at each boot or even between boots. 1588-2008 does not address the possibility of a node owning more than one IP address or belonging to multiple subnets, as is common in modern networks.

Slave nodes learn both the portIdentity and IP address of the master by inspecting the incoming UDP Announces and Syncs. They can then direct messages to the master either by unicast to the master's IP address or by multicast to the entire network.

If a known portIdentity responds from a new IP address, a monitor may reliably deduce that either the node has multiple IPs or that it has changed IPs. But if the portIdentity changes, the master cannot deduce that the response, even if from a previously-known IP, is from the previously-known node. It may be a brand-new node. Should a monitor call the old node offline? There is no way to tell. The lack of portIdentity persistence does not affect the operation of the protocol, but it makes keeping centralised historical logs problematic.

From the perspective of PTP's goal as a time-distribution network, this is sufficient. Masters will almost certainly have unchanging portIdentities and probably also unchanging IP addresses. A slave will ensure that whatever portIdentity it adopts will persist until the next time it restarts; and even if it changes IPs dynamically, only the slave itself needs to know its own IP address.

Summary : The combination of non-persistent portIdentities and non-persistent IP addresses present a challenge to monitoring programs. Cross-contamination of historical records is possible and pre-configuring IP addresses and portIdentities is fruitless. The only method monitors have to discover the network topology are 1) passive listening to overhear multicasts sent among nodes, and 2) querying nodes using Management message. Yet, neither is sufficient.

Management Message Hurdles

The first hurdle is that IEEE Sections 9.2.2 and 15.1.1 allow fully-compliant nodes to ignore Management messages altogether. Out-of-band management or a fully static configuration, are both perfectly acceptable. Further, 1588-2008 does not specify that, if a node supports Management messages, it must support all of them. IEEE Section 15.3.1 says, "Management messages not accepted shall be ignored", and gives no standard for which messages should be accepted, other than to specify certain classes that should be ignored for certain types of nodes. In hindsight, perhaps the standard should have required nodes to NAK unsupported messages rather than ignoring them, but it's too late for that.

Most nodes, either hardware or software, support only a limited subset of possible Management messages, and the list is manufacturer-dependent. Therefore, a monitoring program cannot know if an unresponsive node is offline or just does not support the Management messages sent to it. Further, because not all nodes will support the same subset, it may be possible for a monitor to discover the existence of a node without being able to discover its settings or current status.

The second hurdle is the one presented by changing portIdentities and IP addresses. Unlike the time distribution and delay management exchanges, Management messages must be addressed to a specific node, even when sent by multicast. Each Management message has a field called the targetPortIdentity. When sending a management request, the requestor puts its own portIdentity into the header as usual and the target's portIdentity into targetPortIdentity field. Regardless of whether the request is unicast or multicast, the targetPortIdentity must match the intended node's current portIdentity. 1588-2008 forbids a node from responding to management requests that are not addressed to it. In consideration that 'broadcasts' might be useful, 1588-2008 specifies two wildcards in IEEE Table 36. Either the clockIdentity or the portNumber (or both) may be wildcarded in a request. If both are wildcarded and the request is multicast, all nodes on the network may respond. Note, that there is no similar wildcard for the target node's domainNumber.

Summary : Management message support is entirely optional. Even when supported overall, support for any particular Management message is implementation-dependent.

The pseudo-code for deciding whether to respond to a Management message is shown below.

if (targetClockIdentity == myClockIdentity OR targetClockIdentity == wildcard)
{
if (targetPortNumber == myPortNumber OR targetPortNumber == wildcard)
{
if (targetDomain == myPTPDomain)
{
then respond, but only if the manufacturer has implemented handling for the particular management message type and message action
}
}
}

Management messages that fail the above tests must be ignored. Since Management messages must either be addressed to all nodes or only to a specific node, a monitoring program must either 1) send all queries by multicast to all nodes and process all the replies, even if it only wants to know about a specific node; or 2) already know the portIdentity of the target node. If the monitoring program knows both the target node's portIdentity and IP address, it may send the request by unicast and hope for a unicast response. In practice, a monitoring program may send a few multicasts to the network for discovery, collect the responses and then use unicast to communicate with each node that responded to the multicast.

Multicast discovery followed by unicast querying seems reasonable, but ignores that some nodes may respond to multicast but not unicast and that either the portIdentity or the IP address of the target may change between when it is discovered by the multicast and when it is queried by unicast.

The third hurdle for a monitoring program is the lack of a wildcard for the domainNumber. Unless it only monitors one domain, the monitor must replicate its wildcard queries for each possible domain. This means that, if configured to monitor everything, the monitor would have to send 128 queries (for domains 0-127) to be sure to catch all possible listeners. Domain Time regards Management messages sent to domainNumber 255 to be wildcarded and will reply. Although in violation of 1588-2008, it is safe because the reply must contain the actual domainNumber being used by the node and because domainNumber 255 is disallowed. No other product, to our knowledge, responds to a wildcard domainNumber. This helps the multi-domain problem if all nodes are Domain Time, but does not help if some nodes are appliances or clients from other vendors.

Each logical domain must be queried separately, first to discover whether multiple domains exist on the same signaling path and second to monitor the nodes discovered.

An additional problem with multicast discovery of nodes is that a single multicast query will generate an unknown number of responses. Switches, routers and individual computer network stacks are free to discard UDP packets when busy. If, say, a single query generates 10,000 responses, the chances of all 10,000 of them returning to the querying node are very small. For small networks or networks whose switches and IP stacks are not busy, expecting a few hundred responses is reasonable. But it might be a different set of responses to the next query, since it is sheer chance that determines which packets get through.

Management message exchanges are not stateful, except for the implicit transaction of a request and its reply. There is no automatic retry and no way to detect the differences among a node that happens to be offline, happens to not understand a query, happens to have its reply dropped or happens to be on a (perhaps temporarily) unreachable network. All the monitor program can know is that it did not receive a reply. If the request was sent by unicast, the lack of reply could be for any of the former reasons or because the node has changed its portIdentity or IP address. The monitoring program can do a limited number of retries, but cannot risk flooding the network with requests that will never generate a reply. It must assume that the target node is offline and stop querying until it has reason to suspect the node has resumed operation.

Sending hundreds or thousands of Management message queries, especially if multicast (meaning the replies will be multicast), can quickly overwhelm a network's capacity.

Why Passive Listening is Insufficient

The possibility of discovering a network's nodes and their interrelationships by simply listening to them talk to each other is intriguing, but ultimately not useful.

First, nodes may be operating in hybrid mode, in which case only the masters can be discovered by listening passively. Even a network tap will not overcome this limitation if more than one switch is involved.

Second, a monitoring program cannot determine the offset of a slave node just by listening to its multicast delay requests. The timestamps embedded in a delay request are echoed back by the master in the delay response, but are not required to be any particular type of timestamp. The timestamps are not parsed by the master and are used by the slave to determine round-trip time. Since only the elapsed time between request and reply matters for a slave, the slave is free to use any kind of counter or timer it wants. Using something like the CPU's time stamp counter (TSC) gives excellent resolution with very low overhead. Chances are good that the slave will use either PTP timestamps or UTC timestamps, but a snooping monitor can make no assumptions and, most importantly, has no way to verify.

Last, passive listening can yield only the state of the network's masters. It may assume that any node sending delay requests is a slave, but this assumption is incorrect. Other types of nodes, including the monitor program itself, may be sending delay requests in order to calibrate with the master(s) currently online. And even if only slaves sent delay requests, we cannot tell if a slave is in an error state or is tracking. If the monitor assumes the slave is tracking, it cannot tell which master is being tracked. Assuming the entire network is using only multicast, the portIdentity and IP address of all talking nodes can be discerned, but not their relationships.

Summary : Ultimately, a monitoring program may use passive listening to discover masters and as a hint toward the discovery of other nodes. To know more, the monitor must use Management messages.

Types of Information Available

The most useful information a PTP monitor could provide is a list of all nodes, together with their current states (master, slave etc.) and their current offsets from a trusted time source. One might want to receive alerts if a master goes offline or if slaves are not tracking within tolerance. This is, alas, the most difficult information to maintain.

Ideally, a monitor should also provide auditing, a point-in-time snapshot with reference to a primary time source to show how far off each machine really is, and real-time alerting, to let administrators know between audits if a problem develops.

For masters, a monitoring program can watch the multicast Announces and Syncs and calculate (without correcting its own clock to match) the difference between the ToD on the master and the ToD on the monitoring node. It can send delay measurement requests as if it were a slave to further refine the delta. However, it cannot know the absolute ToD on the master; it can only know the difference, which depends on the accuracy and precision of its own timekeeping. It is not useful to report that any master that claims to be GPS-derived must be correct, because this is the very thing a monitor is striving to measure.

A visible master doesn't need to be queried, except to measure delay or to discover status when it stops sending Announces or Syncs. All the important information is contained in the overheard Announces and Syncs.

One might think that monitoring the network's masters is a pretty good trick, even if we have to trust them to self-report their states. However, once the monitor moves beyond a single network snapshot, the entire concept of monitoring a master disappears. PTP is a self-configuring network of clocks. It is normal and proper for a master to drop off in favour of another, and for slaves to change allegiance as the best master changes. If redundant hardware devices are present, only one of them will advertise as the master; the others will be in the passive state, ready to take over if needed. There is no 'master' master; there are only 'current' masters (and in a properly-configured network, only one master in any single domain). For any given master in a snapshot taken at time T1, failing to be the master at T2 is not an error condition. In fact, during network startup or at the failure of the current master, one should expect that multiple nodes will begin advertising as masters, until the best master clock algorithm convinces all but one to give up.

For slaves and other non-master nodes, assume for a moment that the monitor enjoys a perfect network with no losses and further enjoys a company of nodes that support all useful Management messages. The monitor can poll the network and produce results as of that moment. It cannot know if changes occur between polling events and it cannot gather all the needed information with a single query (yielding the possibility that a node may change status between asking its state and asking its offset). There is no combination of messages that will yield the current ToD on a non-master beyond an educated guess. A monitor may report what a slave claims to be its offset, without any way to verify that its offset corresponds with reality. And the network will not be perfect: the lack of response to a query may not be used to raise an alert.

Even something as simple as identifying a slave's parent requires two management queries: One to discover the node's state and a second to discover its master. To discover the node's ToD requires knowing its state, its master and its master's ToD. This is a never-ending spiral of recursive queries, especially since network segmentation may make the slave's master visible to the slave itself, but invisible to the monitoring program. Should a monitor that discovers a slave tracking an invisible master consider the slave okay or insane? Should a change in master be considered a meaningful event?

Summary : Not all required information may be available. Even when it is, the results may be ambiguous.

Specific Management Messages

Management messages are defined in IEEE Section 15 and enumerated in IEEE Table 40.

The first one that catches the attention of someone wanting to monitor all nodes is TIME (IEEE Table 48). The query simply asks for a PTP timestamp representing the node's ToD. The format of the reply is prescribed, but not its contents. Recall that there is no requirement for nodes to use either the PTP timescale or UTC internally. The reply does not indicate the epoch or precision and contains no TAI-UTC offset. No procedure is included to measure propagation delay. IEEE Section 15.3.2.1.1 says, "In most cases, the actual precision is on the order of milliseconds or worse, depending on the source of the information used in populating the data field and on the characteristics of the network". And although IEEE Table 40 allows this message to be used to query a node's time, the discussion in IEEE Section 15 refers only to using it to set the target node's time, which no one would ever do.

Summary : Even if the TIME query's reply used a known format so its contents could be interpreted correctly, there is still an ambiguity, on the order of milliseconds, in the measurement.

The PORT_DATA_SET (IEEE Table 61) query may be multicast to the wildcard portIdentity in order to learn of other nodes. The replies will each contain a field called 'portState', indicating the current operational mode, among which the most important are master, slave, and listening. For nodes identifying themselves as slaves, the PARENT_DATA_SET query must then be sent to find out the slave-to-whom information.

Summary : PORT_DATA_SET is the only Management query to yield a node's portState. A monitor may use it both for discovery and to know each node's state.

The PARENT_DATA_SET (IEEE Table 56) reply contains a field called 'parentPortIdentity'. If the node is a slave, the parentPortIdentity specifies its current master's portIdentity (inexplicably called 'parent' only in this one instance). However, the field is either meaningless or just plain wrong if the node is not currently a master or a slave. When querying a master, the reply will contain its own clockIdentity and a portNumber of zero. When querying a slave, the reply will contain its master's clockIdentity and its master's real portNumber - but not the master's IP address or domainNumber. In any other state, however, neither the clockIdentity nor the portNumber may be trusted, as explained in the sidebar, 'Understanding parentPortIdentities'.

Summary : The PARENT_DATA_SET query yields a node's master, but only if the node is already known to be a slave. There is no way to determine from the reply whether the information is meaningful.

Understanding ParentPortIdentities

1588-2008 Section 8.2.3.2 requires that the portNumber of the parentPortIdentity be set to zero at initialisation and when becoming master (IEEE Table 13). When transitioning to the slave state, the portNumber must be set to the selected master's port number (IEEE Table 16), but, maddeningly, at no point in the standard does it say what should be done when transitioning from slave to any state other than master, such as when a slave loses contact with its current master and enters the listening state.

Domain Time chooses to revert to the initialisation state after losing its master, utilising its own clockIdentity and a portNumber of zero. PTPd behaves differently with each version. It can also populate the parentPortIdentity with the information from the master it would be following if it were a slave or was following when it was a slave. A standard-compliant node configured to be master-only, but that does not believe itself to be the best clock, will change to the passive state, but IEEE Table 15 is silent about the parentPortIdentity contents of a passive node. Some implementations record the best clock in the parentPortIdentity, including a non-zero portNumber, but not all do. See IEEE Figures 23-26 for all the gory details to determine the current state, and IEEE Tables 13-16 for how the data sets must be updated at each transition.

The TIME_PROPERTIES_DATA_SET query (IEEE Table 57) lets a monitor determine if a node is using the PTP timescale, the status of its leap flags and what it believes is the current TAI-UTC offset (if known). Unfortunately, if the node is not a master with direct access to a primary source such as GNSS/GPS, IEEE Table 16 says that a slave simply copies the chosen master's information into the dataset and is under no obligation to use those settings itself. As with PARENT_DATA_SET, the standard fails to specify when a former slave should update its values; updates are only specified for initialisation and for transitioning to slave or master. TIME_PROPERTIES_DATA_SET suffers larger problems: If the node is a master, we already know its time properties, so the query is useless; if the node is a slave, we again know the master's time properties, but not the slave's; if the node is neither master nor slave, the dataset may or may not represent a former state. Two fields from the reply are useful if we know by other means that the node is a slave: the timeSource field, which tells us if the master is using GPS, NTP, or another time source, and the timeScale field, which tells us if the master is using the PTP or ARBitrary timescale. If we assume that a slave uses the PTP timescale with the same TAI-UTC offset as its master, we can then try to interpret overheard Delay Request multicasts. But this assumption is unwarranted, because timestamps inside a Delay Request need not follow any particular format, and the slave is under no obligation to use the PTP timescale internally even if it reports that its master does. Alas, the TIME_PROPERTIES_DATA_SET response from a slave only includes its master's traceability to UTC, which is meaningless for demonstrating the provenance of the slave's time without yet another Management message query.

Summary : The TIME_PROPERTIES_DATA_SET can provide useful information, but only if the node is already known to be a slave and several unwarranted assumptions about the meaning of the returned data are made.

The CURRENT_DATA_SET query (IEEE Table 55) is perhaps the most useful. When combined with the knowledge obtained by PORT_DATA_SET that the node is a slave and with the information obtained by overhearing the master's Announces, we can place the slave's distance from a primary time source. The CURRENT_DATA_SET reply includes a field called 'stepsRemoved', which a slave is required to set to 1 + the stepsRemoved of its master.

If the master claims its own stepsRemoved as zero and claims a timeSource of GPS or atomic clock, this is the same as an NTP stratum 1 server. By the standard's definition, the ToD contained in the master's Syncs is accurate to within the limit advertised in the master's most recent Announce (typically 100 nanoseconds for a GPS-based appliance). Additionally, the Announce includes a flag indicating whether its time is traceable to a primary source. Finally, a master always reports its offset from its source as zero, so we can only check it by comparing the time served to another reference clock.

Summary : Provided we already know a node is a slave, the CURRENT_DATA_SET provides one of the key requirements of auditing: traceability. We can prove that at the time of the query, the slave was tracking a source traceable to a primary source. Unfortunately, the same query can only provide implications of accuracy, not proof.

Known Unknowns

The CURRENT_DATA_SET reply contains only three fields: stepsRemoved (discussed above), offsetFromMaster and meanPathDelay. The last two report what the slave itself has measured as its delta and latency from its master (IEEE Sections 8.2.2.3 and 8.2.2.4). Fortunately, neither are affected by the timescale in use or by the UTC offset, and both have a specific format that must be used.

Note that this message does not include whether the node is a slave (and if so, to whom), nor does it include how recently the offset was calculated (or indeed if it is even valid at all), nor does it include whether or not the node is using this offset to steer the clock (or, if so, how successfully). The ToD at the node, therefore, may only be inferred, not deduced. The inference requires three pieces of information: Firstly, knowledge of ToD at that node's master; secondly, knowledge that the node being queried is, in fact, a slave; finally, the belief that the slave's offset, combined with the master's ToD, represents the slave's ToD. A master is required to report zeros for its offsetFromMaster and meanPathDelay, and a non-master/non-slave node may report either zeros or its last-known values. Therefore, to track the ToD at a master, a monitoring program must perform the full measurement techniques used by a slave, processing Announces, Syncs and delay measurements (but without adjusting its clock to match).

Unfortunately, the actual data is implementation-specific. Each node is free to use whatever format or scale it wants to keep track of its delta. IEEE Section 4.2 calls out in excruciating detail how the values are to be calculated, but not how they are to be stored, averaged, filtered or used to steer the clock. Each node presumably knows how it's storing the data and can back-convert to the required format when answering the CURRENT_DATA_SET query, but a loss of precision (noise in the least significant digits) is possible. Additionally, although IEEE Section 15.5.3.4.1.2 says explicitly that the offsetFromMaster field "shall be the value of the currentDS.offsetFromMaster of the dataset", A.5.3 and following reemphasise the problems with determining accuracy and the implementation-dependent nature of compensating for fluctuations. The very name of the field meanPathDelay indicates that it should not represent a single measurement. Again, most implementations use a combination of spike filters, finite impulse response (FIR) filters and other statistical methods to arrive at a 'real' offsetFromMaster and 'real' meanPathDelay.

Recall that a master is required to report its own offset from its time source as zero. An important oversight in 1588-2008 becomes apparent if you think about this requirement for a minute: If the master is using anything but a directly-connected GPS or radio clock, it should report its delta from its own source instead of reporting zero, and that delta should be added to the slave's delta from its master to determine the slave's true delta from the master's source. But this information is not available, so we must assume that anyone who cares about monitoring a slave's delta to less than a handful of milliseconds is using a GPS-connected appliance as the master.

At any given audit point, provided that all required Management messages are supported, we can record that a slave was n microseconds off from GPS or another primary source. But can we really trust what the slave reports as its current delta? Unfortunately not, and the problem again rests not with the data, but with 1588-2008 itself.

Summary : The only method PTP offers to let us know a slave's ToD to sub-millisecond levels is to know its master's ToD and the slave's self-reported offset from that master. We can combine this information with the traceability information to establish an audit point. But the slave's reported offset is ambiguous.

Unknown Unknowns

Many of the ambiguities in 1588-2008 are intentional, because it only describes time distribution, not steering mechanisms or other implementation details. It was not designed for monitoring or auditing, as the following thought experiment will show.

Let's say that a slave has calculated its delta from the master and submitted this to adjtime() or adjtimex() on Linux or SetSystemTimeAdjustment() on Windows for steering purposes. Let's also say the delta was small enough to correct in less than a second. With Syncs coming in every second, the slave, if everything is perfect, should now have a delta of zero. (Of course, not everything will be perfect, but the slave can't know of problems until the next Sync.) If queried for its CURRENT_DATA_SET between Syncs, should the slave report zero because it's already fixed the clock or should it report whatever the last delta was? More realistically, if the slave collects and filters Sync data over an implementation-dependent period of time before submitting them to the kernel for correction, should it report the most recent delta, the current running average that has not yet been submitted, the most recent delta that has already been corrected, the remainder left to be submitted after the current correction completes or perhaps its performance over the last 24 hours?

There is no answer and the question is not hypothetical. Each implementation is free to report whatever it wants. A node may even comply with every part of 1588-2008 without ever trying to syntonise or synchronise, let alone succeeding.

Summary : The meaning of the CURRENT_DATA_SET reply is implementation-dependent. This is a fatal flaw in 1588-2008, but is consistent with its design goal of distributing the time without being able to measure how well that distribution is going.

Local Statistics

Most implementations keep statistics. Domain Time does so by default and for PTPd it's a compile-time option. Others keep whatever statistics they want, in whatever format they want, including none at all. One could develop a log-scraping utility to scoop up each node's statistics, but this still relies on the node's own calculations of its offsets and would need to be customised for every implementation's log format. An auditor wants to know how far the clock was off at any given instant, not how far it claims to have been off or what its average was over the last n hours or days.

Domain Time keeps one data point for each incoming Sync (or Sync/Follow-Up pair). It also rolls up the data into a period-spanning average at each timeset event, much as NTPd does in its loopstats logs. Both the individual data points and the summary data points can be collected centrally for historical records.

As of version 2.3, PTPd and most of its derivatives can keep detailed logs, depending on how the administrator compiles and configures the daemon. The format is not well-documented and no provision is made for central collection. Unfortunately, the statistics file does not include the master's time provenance.

Summary : There is no standard for statistics on each node. It is entirely optional, and, if implemented, may use any format that strikes the developer's fancy. Only a few software suites offer centralised collection.

Local Monitoring

On each node of interest, one could combine operating system's ToD against information obtained independently, say via NTP from a stratum 1 appliance, to determine how well PTP is steering the local clock. To be robust, it would have to emulate a program like NTPd and track deviations from multiple servers at regular intervals, and perform all the functions of NTPd except steering the clock. (A single sample, like SNTP, is only good to a handful of milliseconds.) The program would then have a solid basis for comparing the NTPd-derived time against the machine's ToD to generate logs. To satisfy FINRA or MiFID 2, the log would have to include the NTP server's provenance along with the calculated delta. This is not terribly complicated and avoids the problems with monitoring a PTP slave node remotely.

Since the various regulations do not really care how a machine's clock is steered (as long as the result is traceable to UTC to within tolerance), local monitoring by comparing the ToD to a UTC-traceable external clock suffices to demonstrate compliance even if the node isn't a PTP slave. The problems with PTP's Management messages and addressing ambiguities disappear entirely under this scheme; in fact, the monitor doesn't even need to communicate by PTP at all.

How often the clock needs to be checked and what level of deviation is acceptable, would need to be configurable. Centralising the data collection would be a nice touch, as would raising alerts if a node drifts outside of tolerance.

Summary : If you are looking to roll your own solution for compliance, you could do worse than the local monitoring approach.

Centralised Monitoring

The only method supported for monitoring PTP nodes using PTP messages is a combination of passive observation (for masters) and querying other nodes using Management messages. The limitations of PTP node addressing (specifically the lack of IP-to-portIdentity mapping) means that general queries must be sent by multicast, even if follow-ups are sent by unicast. Solving the problem of having multiple listeners on the same unicast ports is non-trivial.

The biggest problem with centralised monitoring using PTP itself is the scope of the traffic required. PTP is already a 'chatty' protocol, with multiple multicasts per second. Even if all slaves are able to use hybrid mode to cut down on multicast traffic, the unicast delay measurement packets are not free. Adding monitoring by multicast has the effect of multiplying the traffic, perhaps exponentially.

Consider a modest network of 1,000 nodes on a network where no UDP packets are ever dropped or duplicated. To scan the network, a monitor would first have to send at least one general Management request, perhaps PORT_DATA_SET, by multicast (1 multicast request, 1,000 multicast replies) in order to determine the presence of the 1,000 nodes. If the monitor can depend on unicast for everything else, which is not a given, four more messages must be sent to each node (4,000 requests, 4,000 replies). If unicast is not supported by all nodes, the monitor would have to use multicast for its four follow-up messages (4 multicast requests, 4,000 multicast replies). Keep in mind that the monitor would also have to be tracking the master's Announces and Syncs and using delay measurement messages against the master. It would also need to compare each master's time against a trusted source using some other mechanism. Let's assume unicast works and is preferred over multicast: This adds up to 4,001 requests and 5,000 replies for each 'sweep' of the network, not including the traffic for monitoring the master, all to create 1,000 records.

Now multiply this by the number of logical domains you want to monitor. Remember that although Management requests may be 'broadcast' by multicast to all-nodes/all-ports, it is limited to one domain at a time. So if you have n logical domains, you would need to repeat the discovery process for each one.

The nodes themselves are quite busy apart from handling the monitor's messages. Using the default settings, the master must multicast Announces every other second and must multicast Syncs (including Sync Follow-Ups if the master is a two-step clock) once a second. Every other node on the network must process these messages in order to exercise the BMC. Any node that is a slave will send - either unicast or multicast - a Delay Request once a second and the master will reply - either by unicast or multicast - also once a second. Syncs and Delay Requests are very sensitive to propagation delay, so the more traffic that switches and network stacks must handle, the less rigorous the entire timing network becomes. It is easy to see how monitoring even once a second, along with the existing activity, could overwhelm ordinary hardware and interfere with timing.

UDP networks drop packets and create duplicates (especially if the nodes being monitored have multiple IP addresses). So our snapshot, no matter what interval we use between sweeps, will have missing information or information that needs to be processed only to be discarded. Remember, too, that if multicast is used by the monitor for its follow-up queries, all but the target node must process and discard each follow-up query.

A segregated network, using PTP boundary clocks to control the traffic flood, would be much less affected than a single segment. But boundary clocks are expensive and they can also prevent a monitor from seeing the entire network (IEEE Section 15.3.3).

Summary : Monitoring via PTP Management messages can create traffic floods. The flooding can be severe enough to exceed appliance or switch specifications and the delays may interfere with PTP slaves' ability to maintain accurate and precise time.

Storage of the results also must be considered. Let's say we monitor our 1,000-node network once every second. The result record for each node would need to contain the offset information along with the time's provenance and its delta from the reference time source, as well as the timestamp indicating when the sample was taken. We would also need success and error indicators, because not every node is going to respond every time. Conservatively, assuming we can pack all the needed information into 128 bytes, we have 128,000 bytes accumulating every second, 7,680,000/minute, 460,800,000/hour or 11,059,200,000/day, for a grand total of 77,414,400,000 for a week's worth of data. If your industry requires you to keep a month's worth of data online, you'd need 332.88 GB/month. Call it approximately 1 TB/quarter. Modern systems can handle this load, assuming that older data is migrated to remote storage on a regular basis. But searching through even a month's data for a specific node's performance would not be trivial unless the data were kept in an indexed database rather than a binary file - and there goes our conservative estimate of 128 bytes/record. Multiply by at least ten to account for non-binary fields, indices and logs.

Summary : Keeping historical records of PTP performance may take more storage than you expect, depending on how often you collect it and how long you are required to retain it.

One last point: A master ceasing to advertise is not an error condition, no more than is a slave going offline during a reboot or all nodes failing over to a backup appliance. Atmospheric conditions can affect the claimed time quality of GNSS/GPS-driven masters. Masters driven by code division multiple access (CDMA) track the strongest cell tower signal and change towers as conditions change, leading to different levels of quality and a re-election as often as every few minutes. Malicious or incautious operators may set up a software-based master to claim the highest priority and best time quality, thereby steering the network away from its proper servers. We highly recommend setting all nodes other than appliances to slave-only mode. Many implementations allow specifying acceptable masters, either by IP/CIDR or by master portIdentity; if these methods are supported by your PTP clients, you should use them.

There appears to be no good way to distinguish the normal operation of PTP from a series of non-critical errors. Read that sentence again. This makes raising real-time alerts impractical.

Summary : The ultimate problem with central monitoring using PTP messages is that the protocol was not designed with the requirement for real-time (or even near-real-time) monitoring in mind.

Solutions

Non-proprietary methods currently available for monitoring PTP nodes include log-scraping/collection, local monitoring on each machine and centralised querying via Management (or proprietary) messages. All methods have inherent flaws and limits and all but local monitoring rely on slaves to report their own offsets accurately.

The forthcoming PTPv3 (no date announced) has some enhancements, especially in regard to better and more revealing statistics. But it will rely on the same Management message querying as PTPv2 for these statistics and it will still rely on nodes to measure their own performance. One of PTPv3's proposals is for a management station to send a command to a node saying, "Report stats by multicast to the wildcard address". The node would then be responsible for keeping the management station up to date. This is a step forward, but the current proposal is so murky and complex that we find it unlikely to survive in the new standard unscathed. As it currently reads, nodes would be able to choose how often to send updates and whether they represent running averages or point-in-time samples, with no way for the management station to know what to expect. A naïve implementation would require the management station to request each type of statistic individually and the node would happily re-multicast all the data from received Announces and Syncs, along with computed statistics of performance, deltas, delays, Allan deviations and so on, wrapped in an unknown number of type-length-values (TLVs) attached to a series of Signaling messages each time it reported. This would have the effect of multiplying traffic exponentially and is unsustainable in a network of any size. The proposal mentions transmission to a unicast address (which makes much more sense), but provides no mechanism for a management station to request unicast messages. I am sure the IEEE committee will give these matters due consideration, but without a full specification or even a timetable for delivery available, waiting for PTPv3 is not a realistic option.

In-band solutions which use PTP's messages to monitor an entire network are more complicated, but have the advantage of coming pre-packaged from multiple vendors with FINRA and MiFID 2 in mind.

Greyware's Domain Time suite already has an out-of-band mechanism in place for monitoring, auditing and real-time alerting, regardless of the protocol being used to obtain the time by each client. If the clients happen to use PTP, the current master is included in the auditing data. Additionally, Greyware's PTP Monitor, bundled with its Manager/Audit Server product, does in-band, near-real-time tracking of masters. The product also provides point-in-time auditing of all other PTP nodes, across any combination of logical domains, regardless of manufacturer. It collects statistics centrally using a combination of multicast and unicast Management messages, and records events in its log. The number of nodes tracked is artificially limited to prevent flooding. PTP Monitor gives a visual display of all nodes, their current deltas, what delay mechanisms are being used, which are masters, which are slaves (and to whom) and - where available - manufacturer data such as device names, serial numbers and so forth, totaling over 30 fields of relevant information per node.

Meinberg suggested a PTP monitoring extension which attaches TLVs to existing messages. Unfortunately, it violates interoperability rules by requiring slave nodes to respond to Delay requests and send Syncs, both of which are forbidden by 1588-2008. It also calls out IEEE Section 9.2.5 for the definition of the reply TLV's portState member, but inexplicably allocates two bytes for a one-byte field with no public documentation. Meinberg may be using this mechanism for its proprietary monitoring, but it would only work with Meinberg nodes and could cause confusion for standard-compliant nodes. Meinberg also offers a free PTP Monitoring Tool that works as a foreground-only application. It relies on libpcap (Linux) or WinPcap (Windows) for packet capture and network analysis. Unfortunately, the timestamps generated by WinPcap are unreliable and unlikely to be fixed (the last update was in 2013). Further, Meinberg's free tool is a wonderful demonstration of the problems associated with determining node status solely by eavesdropping on multicast messages. It is a nifty way to explore your network, but requires all of your nodes to be using only multicast to work properly, it cannot identify problems, cannot run in the background as an auditor and cannot distinguish the time epoch used in Delay Requests, leading to unexpected results if a node is using UTC or a proprietary timescale internally.

Some PTP 'grandmaster' appliances keep statistics gleaned from the Delay Requests sent by its slaves. This approach is proprietary and assumes that all slaves are using the PTP timescale and put PTP-style timestamps in the Delay Request packet. It also neglects to account for the fact that Delay Requests are not targeted; it could be collecting statistics from slaves who are following other masters using a different timescale or a different TAI-UTC offset. Since delay measurement packet contents are only useful if the sent time is known (which means only the slave can make sense of what it sent and what it receives in reply), the accuracy of the collected timestamps, even if of the proper type, is subject to multi-millisecond swings from stack, switch and network jitter.

FSMLabs released a compliance product in October 2016 called TK Compliance 1.0, designed to work with their TimeKeeper 7.2. It claims to be able to monitor both TimeKeeper and non-TimeKeeper nodes and it provides extensive reporting abilities. The mechanisms used to gather the data are not published, but if it works with non-TimeKeeper nodes, it likely eavesdrops on multicasts and/or uses Management messages.

An Immodest Proposal

I suggest a request for comments (RFC) or IEEE annex to specify a new Management message query. The reply would consolidate all the information required for centralised monitoring. For example, a single multicast sent to the wildcard address using a domain of 255 could reveal the portIdentity, IP address, current state, current master (if slave), current offset and other necessary bits all in one reply packet, regardless of the responding node's domain. The replies could be unicast and the request could include a version number to allow for forward-compatibility. The work is not in creating the specification, but in implementing it. Since 1588-2008 makes all Management messages optional, each manufacturer would need to add support for at least this one query. If some nodes on your network do not support the query (most likely your appliances, since they are the hardest to update), your monitoring station ends up back at square one, sending multicast discovery packets the old way.

Conclusion

Right now, PTP is the best way to keep your network tightly synchronised, absent a GPS receiver card plugged into each machine. But proving the time accuracy to stringent industry requirements across a heterogeneous collection of nodes from different manufacturers remains a challenge without a vendor-agnostic solution.