Eurex introduced a new trading platform that represents a radical departure from its previous platform based on OpenVMS (Eurex Version 14.0, the final release on this technology stack). It was originally named "New Trading Architecture" (one had to wonder what the next "new" version would have been called), but Eurex sensibly renamed it T7.
T7 was introduced via a staggered transition, with the least important products, e.g. weather, property and inflation derivatives, migrating first.
This first wave was released on 3 December 2012, with several other migrations (such as single stocks options) following throughout the first quarter of 2013.
Finally, the first real benchmark products have arrived on T7. As of 8 May 2013, we have some meaningful production statistics for some highly liquid futures such as DAX index futures (FDAX) and EURO STOXX 50 futures (FESX).
We have analysed the latency of the Eurex matching network: Gateways, Matching Hosts and Market Data publishers.
An inspection of four weeks of trading gives the following values for the FESX, unconditional across the exchange:
|Core RTT||Gateway RTT|
|T7 Version 1.0||233.0||93.5||2,052.0||463.8||273.0||2,856.0|
|timings are in microseconds|
Depending on the criteria used for measurement, we can see that processing times have been shortened by a factor of 3-4. This is a decent speed-up considering the complexity of the new system. As every system architect knows, increases of an order of magnitude are always hoped for, but almost never achieved.
On top of this, while perhaps antiquated in terms of architecture, Eurex Version 14 was still considered a fast exchange by most participants, supporting an impressive daily throughput north of 100 million transactions, while its sister (Xetra, currently on Version 13) supports more than 200 million daily transactions. (Note that "transactions" in this article refers to messages, not matched trades).
To consider each of the components we draw a network diagram of the various paths for dataflow between the exchange and our trading hosts.
Figure 1: Simplified Exchange Message Route
Figure 2: Gateway Timing
We can directly observe the timing along the edges of the loop by inspecting message traffic.
In this simplified view, we are not paying attention to the various elements of redundancy that exist inside the exchange (such as A and B failover clusters) or the plethora of components that are at work outside of the matching engine (persistence systems, failover devices, clearing interfaces, etc.)
In addition to the timing points that are directly observable, there are two additional measuring points on the far side of the gateway (the side facing the matching engines).
These additional timing points give direct insight on how loaded the gateways are. These timings give an important indication of which gateways are particularly busy at any given time.
These timings are not available publicly, however, we can reverse-engineer the transit times by modeling them as four unknown edges along the left side of loop (whereas the sum of the entire semi-loop is known).
Figure 3: Graph Model
Figure 03 shows the graph model we use to solve for the unknown timing elements in the latency graph.
From a trader's perspective, the gateway is the bottleneck of the entire process (the real bottleneck is of course the matching engine, but we have no control about what happens beyond the gateway, so from our perspective the gateway is the de facto bottleneck). The choice of gateway is important as sending an order to a gateway that is busy can make the difference between getting to the matching engine in time or not (more on this later).When analysing network latencies in a system, it is important to establish a baseline latency in the idle state. We can look at various percentiles in the distribution or we can look at minimum transit times along the edges. Usually the median is a good choice as it will resemble a "typical" transaction. But the percentile (of which the median is a specific case) is not a linear operator. Averages however are linear, so when we compare the averages along edges of the graph, we at least get the same result as for the entire route, which aids intuition.
As we look at the base latencies along each segment of the loop, we can approximate a bare minimum time required to get in (and back out).
For inbound messages: 35+25+12 = 72 µs
For outbound messages: 50+20+20 = 90 µs
Note that the 83µs number for market data publication is for order book updates only. If a trade is generated, the outbound messages take at least twice as long (for trades that involve many orders this can be many hundreds of microseconds).
Figure 4: Bare Minimum Transit Times
Essentially we are looking at a bare minimum of, say, 160 microseconds. Now, the reality is, one will almost never see this number because the system just does not offer this degree of responsiveness at every point all the time.
Of course, this inspection is no more than a cursory glance, unconditional on any specific events such as trades being executed or the order book shifting significantly. And as all high frequency traders will tell you, the times when "interesting" things happen is what actually really matters. Having established the baseline above, we can start looking at other latency numbers. Particularly, the distribution of messages passing through the gateways into the core is interesting as this is ultimately the deciding factor for many opportunities.
Figure 5: Median and Average Latencies On Exchange Side
Figure 6: Queueing on Core
One of the important factors for how suitable an exchange is for high frequency trading is how quickly we can act on new information. A particularly important factor is how quickly we can react to new significant information (which is when everybody wants to act and the nature of competition really starts to kick in).
To that end, it is helpful to examine two points:
The distributions of waiting times going into the matching engine
(shown in Figure 06)
The distribution of waiting times at the gateways (shown in Figure 07)
Figure 7: Responsiveness of Exchange System
Beginning with the matching engine, there is a fat right-tail. What matters is when this fat right-tail occurs. If we correlate the queuing time to how long the trigger event took to process in the matching engine (an indication of trade complexity), we arrive at the essence of low-latency trading:
If information arriving is really interesting, everyone will start sending orders (building up a queue for everyone but the fastest).
We can see that there is some impact (again, sampled unconditionally). The lower rising edge of the cloud is a sure sign of correlated impact and something we need to consider when making trading decisions: how long will we potentially be queuing and is this delay going to cause risk (think: legging risk on inter-market spreads, for example)?
Looking at the overall responsiveness of the system, we can see that latency (as measured by Gateway RTT) has been improved by at least a factor of 2.0 (as measured by median) or a factor of 3.5 (as measured by the mean). We refer back to the table at the beginning of this article. The distribution of Gateway RTTs is also shown in Figure 07 above.
The matching engine is one component, and it seems that the new edition is, in fact, very predictable. What about the gateways? From the perspective of the trader, this is where the bottleneck is. Eurex has tried to cater to different needs by splitting the gateways into high- and low-frequency varieties, with 16 HF gateways and six LF gateways. However, Eurex is currently looking at possibly reducing the number of HF gateways.
Here we see a significant queuing effect (see Figure 08 below). At times, there are orders that congest up to 10 times the average waiting time. This is the part that every high frequency trader complains about because it means that our orders had to wait before slipping into the gateway. Note that these are unconditional waiting times. When things get interesting, the queue can get very long indeed and that is when latency matters most (unfortunately we cannot show these here, for obvious reasons).
On release, the new architecture was significantly faster than the old one (by about a factor of four). More to the point, Deutsche Börse still has room to optimise. At the time of writing we have just seen T7 Version 1.1 hit the rack shelves, and the first samples are in. Early indications are that, depending on the section, one can see optimisations that range between 10-25%. This brings an overall reduction in latency of roughly 20% compared with Version 1.0. The race continues.
How much difference this will actually make to P/L of course depends first and foremost on the trading style and what adjustment traders make to their platform.
Users of commercial third party solutions to conduct trading will have more problems as these tend to be slower and so the gap widens. For the market place it is good news overall, however, as faster response times mean a better handle on risk (execution uncertainty), this in turn reduces the price for liquidity.