The Gateway to Algorithmic and Automated Trading

Simple Binary Encoding for high performance market data interfaces

Published in Automated Trader Magazine Issue 44 Q1 2018

The FIX Trading Community realised that its message encoding was no longer fit-for-purpose for high-performance trading. It came up with the Simple Binary Encoding (SBE) standard to balance the needs of latency versus bandwidth utilisation. We look at how the market data interfaces at the CME Group were implemented with SBE.

About the authors

Donald Mendelson

Donald Mendelson is owner of Silver Flash LLC, a provider of software services for capital markets. He also acts in the role of Technical Architect for the FIX Trading Community. Donald was co-chair of the SBE Working Group and also lead the development of other FIX protocols.

Fred Malabre

Fred Malabre is Senior Director, Architecture and Product Management, at CME Group. Fred was co-chair of the FIX Trading Community High Performance Working Group from 2012 to 2015. SBE was one of the binary encodings looked at. Since 2012 Fred has also been co-chair of the SBE Working Group.

Nowadays, a large proportion of trading decisions are made by algorithms, not humans in trading pits or traders clicking with a mouse. Message encoding must be up to the task, along with every other aspect of algorithmic applications. To this end, the FIX organisation developed the Simple Binary Encoding (SBE) standard to balance the needs of latency versus bandwidth utilisation. CME Group saw SBE as an opportunity to speed up both its internal messages and the market data which it disseminates to the trading world.

The low latency challenge

Circumstances change in unforeseen ways, no matter the brilliance of planners. What should you do about a protocol that is wildly successful for exchanging financial information around the world and yet is not fit-for-purpose for new use cases? That is the question that challenged the FIX Trading Community regarding high performance trading. FIX Protocol was created by the non-profit organisation 20 years ago to exchange trading information between US stock brokers. It became a standard in the industry as its usage extended to more asset classes, including futures and options, fixed-income and other instruments. At the same time, it covered other facets of the business beyond order routing, including market data dissemination, securities reference data and post-trade processing. FIX then spread geographically from the US to Europe and Asia. FIX was able to meet all these various scenarios because they were expansions of business semantics. The new challenge was high-performance trading - a technical challenge to deliver messages with latencies measured in microseconds and even nanoseconds.

To be clear, the FIX Protocol is an industry standard, not a software implementation. Implementations need to follow the standard to interoperate, and there are numerous open-source and proprietary implementations of FIX. However, the composition of a standard has a strong influence on whether high performance is attainable. Unfortunately, the original FIX protocol design created barriers that prevented achieving really low-latency performance.

ASCII Protocol doesn't cut it

To see why Simple Binary Encoding is part of the solution to the low-latency challenge, let's first take a tour of the original FIX Protocol. The first significant characteristic of the original FIX Protocol is that it is all-in-one. It specifies message encoding, rules for message exchange (session establishment and check-pointing) and business semantics, such as how order and execution quantities are managed. In the terminology of the Open Systems Interconnection conceptual model (OSI), it is the application layer, the presentation layer and the session layer, all rolled up in one big ball of wax. The problem with a monolithic protocol is that there is no substitutability for a better solution offered at one layer. Separation of layers is clearly desirable and technical standards recently created by the FIX Trading Community aim to do just that. SBE was created to replace the presentation layer, no more and no less. The application layer of FIX that is familiar to thousands of users is retained. It is a requirement for SBE to be able to express existing FIX messages but also to make it possible to optimise them for performance. (A performance session layer is also under development but this is a discussion for another day.)

The second aspect of the original protocol that one needs to understand is why its message encoding is no longer fit-for-purpose. The first problem is that although FIX has about 20 data types, they are all encoded as human-readable ASCII text. Text is acceptable for fields like an account identifier and the like, but numeric data, timestamps and so forth cannot be processed directly by computers. To calculate a price or time difference, text must first be converted back into native binary types. Then, to put the data back in a response message, the computer must convert the binary types back to ASCII text. This is a waste of processing time, especially for time and calendar functions, and it has no place in a high-performance system. Some FIX applications spend up to 80% of their CPU budget in translation of data type representations!

Another problem with the original FIX message format, known as 'tag=value encoding' (see an example in Table 05PN), is that message encoding is highly variable. It was designed for a high degree of optionality. For example, 'expiration date' is an optional field on an order message, because it is only required for derivatives, but not equities. FIX also has a concept of conditionally required fields. A stop price field is only required in an order message if the order type is stop or stop-limit, but is not required otherwise. Moreover, field order within a message is largely non-deterministic. Price and then quantity is as equally valid as quantity and then price. Since the message layout is non-deterministic, it is necessary to convey metadata to identify each field in the message. The field tag is small for each field, but the combination of field tags, field delimiters and a verbose message header result in a typical message size of hundreds of bytes.

Lastly, string fields in tag=value encoding are all of variable length. Even when the receiver of a message has an internal storage limit on string length, such as a database schema, there is no way to express that limit in tag=value encoding. All these characteristics resist performance optimisation. High variability defeats CPU optimisations such as branch prediction and memory pre-fetch. Additionally, high variability of message layout in memory and large message size tends to cause cache misses that can stall a CPU. The variability also makes the encoding unsuitable for hardware implementations such as FPGA.

Binary Type System

From the initiation of the SBE project in the FIX High Performance Working Group, we tried to learn the lessons of the past and optimise for low latency. The first important difference between SBE and tag=value encoding is that SBE uses native binary types on the wire for numeric and other quantifiable data. Each of the thirty-plus FIX datatypes have a binary field encoding defined in the SBE specification (see Table 01). The benefit is that the sender and receiver do not need to translate data representations but rather can directly use data right off the wire. The best kind of optimisation is elimination of unnecessary work.

SBE encodings are often much more compact than their ASCII equivalents (see Table 06PN). For example, this is the FIX ASCII representation of Friday, 4 October 2024 at 14:17:22 (21 bytes with millisecond precision):

20241004-14:17:22.000

Table 01: FIX datatypes in SBE

Table 01: FIX datatypes in SBE

Meanwhile, the SBE wire format of the same UTC timestamp only occupies 8 bytes while delivering nanosecond precision. More importantly, the SBE format is numeric, so time differences can be easily computed. In this example, the backing unsigned 64-bit integer is a count of ticks in 20,000 days and 14 hours, 17 minutes and 22 seconds since the UNIX epoch.

SBE integers are usually encoded little-endian on the wire, since that is conducive to Intel and compatible processors. But to be impartial, SBE also allows big-endian byte order to be specified in a message schema.

004047baa145fb17 (hex)

One consequence of optimising for machine readability rather than human readability is that humans eventually need to read a log of FIX messages for trouble shooting or auditing, so SBE messages must be translated for human consumption. However, visualisation does not impede trading performance since it is done after the fact or on demand.

Deterministic Message Layout

The second difference between encodings is that SBE keeps most metadata out-of-band rather than sending it on the wire with each message. A message structure is defined by a template that must be disseminated to a peer, so it knows how to interpret a message. A message template contains a list of fields in the message along with each field's identifiers and data type. A message may contain nested structures, particularly repeating groups and arrays of blocks of fields. The benefit of a template is that it controls a deterministic message layout. Fields always appear in the message in the same order. Also, character data can have a defined length. Variable-length strings are also supported by SBE for backwards-compatibility with existing FIX messages. However, it is possible for a designer to constrain usage to fixed-length fields and thus produce messages of fixed-length. Another potential benefit of deterministic field position is that an application that only needs to examine select fields, such as an order router, can count on direct access to those fields without having to crack the whole message. Such deterministic message layout also lends itself to FPGA implementations.

In-band transmission of a message contains a message header that identifies its template. This supports mixing of message types in a stream. Templates may be narrowly tuned to a specific use case. The old way of designing FIX messages was to overload a message type with many potential uses while making the numerous alternatives into optional fields. The SBE way is to design a template for each narrow scenario, such as an equity order template distinct from a futures order. Each template results in a highly optimised message layout. Aside from predictability, messages and repeating groups may even be forced to align with cache lines to prevent expensive cache misses.

An SBE message schema is expressed according to a standardised XML schema. You may be thinking that XML is not a good choice for low-latency applications, but remember that a message schema is delivered out-of-band, not at run-time of message exchange. Most implementations parse the XML only once and generate code for encoders and decoders or at least convert it to an intermediate binary representation.

A schema usually contains multiple message templates. A template, labelled by XML element, contains the layout of a single SBE message type. It containsinstances and repeating groups, labelled with XML element. A field is a unit of semantic information, such as a price, quantity or trading symbol. A field has a datatype, and its metadata has a name and a numeric ID. FIX semantics are identified in SBE only through metadata; there are no tags at all on the wire. Experienced FIX users know the common fields, for example Account is tag 1 and Price is tag 44. By using those traditional tags, business knowledge is retained, and firms interact using shared concepts.

A repeating group is an array of field blocks. The number of entries in a repeating group is usually not known until run-time. For example, a market data message may contain multiple prices, corresponding to the number of levels in an order book. The number of levels is variable with the state of a market. On the wire, each repeating group has a tiny header that gives the dimensions of the group, so a decoder can make sense of it.

A wire format for a datatype may be shared by multiple messages and is expressed by reusable simple and composite types. A simple type, denoted by XML element, is backed by one primitive datatype. It can be either a scalar or an array of scalars. A composite type is a combination of two or more simple types, like a struct in the C language. The most common usage is to encode an exact decimal as a pair of integers, representing mantissa and exponent. Note that these are not two separate fields, but rather a small structure that backs a single field. Remember that it is a field that carries semantic information. One feature of SBE is that fields and types can be set to constants in metadata. When constant, the value needs not to be sent on the wire since the constant value is known to both sender and receiver. In the composite encoding for a decimal, the exponent is sometimes set to a constant, so that only the mantissa needs to be transmitted on the wire.

Also, the enumeration of code values is supported. For example, order and execution messages may share an enumeration for order type; it needs to be defined only once and then can be referenced in multiple message templates. Any value sent on the wire that does not correspond to the set of valid values specified in the schema is invalid.

Lastly, fields may be backed by a bitset, which is simply an array of on/off values (or true/false flags). A bitset is sent on the wire as a native integer type of size 8 to 64 bits.

Steady-state Memory

A message encoder or decoder should stride just once through a message buffer. That is best practice. This access pattern takes advantage of memory pre-fetch in hardware. The system anticipates the next piece of memory to be accessed and fetches it ahead of a read or write operation. By not jumping back and forth in a buffer, cache misses are avoided.

Another best practice is to build encoders and decoders as a flyweight pattern over message buffers. That is, there is one encoder object and one decoder object per message type that get reused all day. In older FIX engines, a message object was typically constructed for each sent or received message. A high-volume FIX session can process thousands or even millions of messages per day. In the object-per-message paradigm this would lead to memory churn. In a virtual machine, lengthy garbage collection events inevitably occurred in that design. With SBE and best practices, on the other hand, memory can be close to a steady state.

A Plan for Change

Periodically, message layouts must be changed to accommodate new requirements, such as new financial regulations or a new business logic. Trading operations try to avoid risky 'big-bang' style migrations, preferring to roll out changes incrementally. Message extension was easy in the old tag=value encoding since there was no controlling template. If an application encountered a tag that it did not know, it would have to ignore the field, but it could continue parsing the rest of the message. However, with SBE, the metadata that describes fields and their positions arrives out-of-band. Without making provisions for it in the standard, message extension would be impossible.

Listing 01: CME's SBE template for MBO with selected encoding types

Listing 01: CME's SBE template for MBO with selected encoding types

SBE lays the ground rules for template changes that are backwards-compatible. New fields may only be added at the end of a block, and removal of a field or changes to the datatype of an existing field are disallowed. (In those cases, you must create a new template.) The SBE standard adds versioning metadata in a schema plus two small bits of metadata on the wire to support extension. First, each message schema has an explicit version number. Each time a significant change is made, the publisher bumps the schema version number. For tracking, the schema also supports a sinceVersion attribute, analogous to the @since tag in Javadoc. On the wire, the message header conveys the schema ID and version as well as a template number. Therefore, a receiver can determine whether the message was encoded with current schema version or a different one. Additionally, each block of fields in the message, either the message root or a repeating group, conveys its own block length. So, even if a decoder is behind the encoder's schema version, it can skip over new, unknown fields and continue parsing the remainder.

Use Case: CME Group Market Data Platform

CME Group was the first major exchange to adopt SBE encoding and they are continuously making changes to add SBE encoding on additional electronic trading systems.

SBE encoding provides key benefits aligned with the requirements of CME Group's Globex electronic trading interfaces:

  • High performance encoding/decoding
  • Reasonably small bandwidth utilisation
  • Standardised field types
  • Standardised message definition
  • Flexible message structure
  • Direct data access
  • Full integration with FIX Protocol

Market Data Interface

The CME Group Market Data Platform (MDP) interface provides the 'public view' of the CME Globex market. Market participants receive real-time market data messages reflecting activity in the market; it includes information such as instrument specifications for what can be traded, a working order book up to 10 levels deep aggregated by price and/or by order, market statistics and market states.

This interface uses a FIX Protocol semantic. It was migrated to SBE encoding in 2014 as part of a major upgrade to our systems with the MDP 3.0 release. Another major upgrade was released in 2017. This added market data-by-order to the existing aggregated market data by price feeds in a backward compatible way through utilising SBE encoding properties.

The main themes in designing MDP 3.0 were around improving latency and transparency while allowing content updates in a backward compatible way.

MDP 3.0 goals can be broken down from three high-level requirements:

Low-latency system

  • Raw binary format
  • Direct access to content
  • Facilitate content filtering

Rationalise message content

  • Improve visibility on matching
  • Additional business content
  • Less data content
  • Fit-for-purpose content optimisation
  • Allow future extensions for more granular content

Same business functionality

  • Market by price
  • Same book depth
  • Implied dissemination

Changes needed to the system were significant, covering all aspects on how the content is generated, encoded and disseminated. Designing message structures with SBE for encoding, as well as independence between business content and UDP packets, allowed for these goals to be achieved.

MDP Channels and Feeds

CME Globex generates data for a wide range of asset classes and product types (futures and options). Data is disseminated per channel (usually containing data for a given asset class and product type, i.e. "CME Globex Commodity Futures" channel). Figure 01 shows that each of these channels contain multiple feeds. A feed from a network perspective is a UDP multicast.

Figure 01: CME Group MDP channel and feeds

Figure 01: CME Group MDP channel and feeds

The Incremental Feed is the main and only feed needed to get market events for a trading session. It contains the messages listed in Table 02. The Snapshot Feed is used to initialise data prior to getting incremental updates. This would be needed when joining a channel late or to close data gaps when missing messages on the incremental feed. The Instrument Feed is used to obtain the listing of all contract specifications listed on a given channel. Instrument Data is a set of static files containing detailed specifications on contracts listed across all channels.

Table 02: FIX messages names

Table 02: FIX messages names

MDP Messaging Structures

In order to increase transparency on individual market events, as well as to optimise the business content, messages were restructured within SBE encoding rules.

Major changes in the message structure were:

Events generated for each input

  • Report independently on each market data event that would generate a state change within CME Globex.
  • Give full transparency on sequential events being processed, no event 'bundling' nor snapshot of market.

More granular timestamps

  • Add nanosecond granularity to timestamps (possible with SBE encoding).
  • Improve transparency on the matching processes.

Not fit-for-purpose data is not sent

  • Stop sending content that could be implied from existing content.
  • For example: Net Change and Tick Direction are not sent as they can be fully calculated from existing content.

No message sequencing

  • Produce sequencing per packet (packets can be missed with UDP protocol) as opposed to per message.

Event Based Messaging

Each event on Globex ends up generating one or more sets of trade updates, cumulative volume updates, order book updates, statistic updates and implied order book updates. For each event on Globex, this content is sent in multiple messages and flagged to indicate the end of content type, each content block being optional, as shown in Figure 02.

Figure 02: CME Group MDP incremental feed content blocks

Figure 02: CME Group MDP incremental feed content blocks

Using this structure, specific content can be easily filtered out. For example, if a consumer is interested in processing only book updates and is not interested in other business content, using a combination of direct access, end of content flag and expected content order within messages, only book updates can be parsed and processed.

Examples: End of Content Indicators

Each message is prepended (in orange):

  • Message size
  • Message Schema ID

Each packet is prepended (in grey):

  • Sequence Number
  • Sending Time (nanosecond precision)
Figure 03: CME Group MDP incremental feed packet structure

Figure 03: CME Group MDP incremental feed packet structure

Event results can span multiple packets:

Figure 04: Series of five orders coming in with no trade and no implieds

Figure 04: Series of five orders coming in with no trade and no implieds

Figure 05: One order coming in, trades, statistics, implieds

Figure 05: One order coming in, trades, statistics, implieds

Figure 06: One order coming in, many trades

Figure 06: One order coming in, many trades

EOT: End Of Trades

EOV: End Of Volumes

EOQ: End Of Quotes

EOS: End Of Statistics

EOI: End Of Implieds

EOE: End Of Event

Message encoding

Message encoding is done according to SBE specifications, a raw binary encoding as defined in the Fix Protocol Limited (FPL) High Performance Working Group.

Key features of SBE encoding used in the CME Group implementation are:

  • Choice of little-endian for MDP 3.0. Little-endian byte order is the native order of Intel and compatible processors that dominate the server market.
  • Based on IEEE primitive data type encodings that are handled natively in hardware without unnecessary conversions
  • Extension with 'null' values that preserve deterministic message layouts
  • Complex data types are broken down with primitive data types
  • Maximise direct access to support filtering of market data messages by security ID or other attributes
  • High flexibility
  • Very low latency
  • Lower processing time for encoding/decoding Message schema based to drive content

Encoding Example

Tables 05 and 06 show SBE encoded FIX messages versus ASCII tag=value encoded FIX messages.

FIX messages end up being composed of a message schema and data content. Only data content is sent over the wire while message metadata is exchanged beforehand when latency is not essential.

Figure 07: FIX encoded with SBE

Figure 07: FIX encoded with SBE

Conclusion

Investors and traders have a lot of choices about where to participate in competitive asset classes and trading venues. The adoption of SBE is one element that has allowed CME Group to remain competitive from the perspective of traders running strategies that require very low latency. The characteristics of SBE that allow high performance, native binary representations of data types, deterministic message layouts and efficient use of memory can be beneficial to other users requiring low latency even beyond the financial industry.

The design philosophy that makes these SBE benefits possible is minimalism. The standard permits conveying business information in a concise way with a minimum of scaffolding and bookkeeping. Fewer cycles spent on encoding and decoding of messages leaves resources for what deserve it - making trading decisions.

Further reading

CME Group (2017). MDP message schema. Retrieved from ftp://ftp. cmegroup.com/SBEFix/Production/Templates/templates_FixBinary.xml

Listing 02: Sample SBE template to encode a trade message

Listing 02: Sample SBE template to encode a trade message

Table 05: Sample for a trade encoded as FIX tag=value

Table 05: Sample for a trade encoded as FIX tag=value

Table 06: Sample SBE encoded message

Table 06: Sample SBE encoded message

Appendix A: Examples of CME Globex events for MDP

Example 01: Order coming into CME Globex

Example 01: Order coming into CME Globex

An order added, modified or cancelled generates a single market data event. This market event can contain:

  • Trade updates (including implied trades on other contracts)
  • Cumulative volume updates (including implied volume updated on other contracts)
  • Order book updates
  • Statistic updates (including implied statistics on other contracts)
  • Implied order book updates (on other contracts)
Example 02: Market opens on CME Globex

Example 02: Market opens on CME Globex

In addition to orders, state changes can also generate single market data events:

  • State change
  • Trade updates, cumulative volume updates, order book updates, statistic updates, implied order book updates
Example 03: New single-leg order added in bookExample 03: New single-leg order added in book

Example 03: New single-leg order added in book

Example 04: New spread order added in bookExample 04: New spread order added in book

Example 04: New spread order added in book

Example 05: New order and tradeExample 05: New order and trade

Example 05: New order and trade

Appendix B: Examples of CME Globex encoded MDP messages

Below is an example from our testing environment for our last release of MDP mid-2017. It is based on template #43 for a book update (MDIncrementalRefreshOrderBook43).

The Message schema is available online (see Further Reading, CME Group, 2017).

Example 06: SBE encoded message (hexadecimal representation)

Example 06: SBE encoded message (hexadecimal representation)

Example 07: Decoded message (FIX tag=value representation)

Example 07: Decoded message (FIX tag=value representation)