The Gateway to Algorithmic and Automated Trading

Racing Photons

Published in Automated Trader Magazine Issue 14 Q3 2009

With a flurry of major announcements around SIFMA, nobody could accuse NYSE Euronext of being inactive. Automated Trader caught up with Stanley Young, CEO, NYSE Technologies and Co-CIO, NYSE Euronext to discuss some of these announcements in more detail and in particular their implications for  auto/algo traders.

What percentage of volume does automated/algorithmic trading represent for NYSE Euronext?

We don't keep formal records, but for US markets we believe it is somewhere between 50 and 60%.

Have there been any noticeable changes in the ratio between message volume and consummated orders in the last twelve months?

I would say that over the last year it's been pretty stable. However, what we are seeing overall is a general reduction in the size of trades; participants are clearly tending to break their orders up more.

Stanley YoungWe also see a higher ratio of message traffic to completed orders with new products, which is unsurprising. Understandably, traders initially feel things out to see if there is real liquidity present or whether people are just fishing. However, after a while the ratio tends to stabilise as participants in that market gain a stronger understanding of how it functions.

Do you specify a minimum acceptable ratio between message volume and completed orders?

No we do not. However, we have certain obligations on market-makers and if one of them was consistently quoting away from the best bid and offer, then we would certainly raise the matter with them, because this sort of activity serves no purpose. We wouldn't specifically stop them, but we would want an explanation as to why they were doing this.

You recently announced the launch of your V5 market data platform at SIFMA. How much of a change does that represent from your previous generation technology?

Considerable; V5 represents a complete rethink of our messaging architecture. There are various aspects to this, but one key element is the way in which we have re-architected how we manage feed handlers on top of our data fabric initiative, which is based around a very low latency messaging bus and an API we call MAMA (Middleware Agnostic Messaging API) (see Figure 1 overleaf).

This approach has delivered some serious performance improvements - around ten times more message throughput per second accompanied by an 80% reduction in hardware requirements.

A critical factor in achieving this has been our close collaboration with Intel and our resulting ability to take maximum advantage of the innovative architecture of their Xeon 5500 series processors. The combination of the characteristics of that processor line with our technology re-architecting has been instrumental in delivering the performance of V5.

Have you also explored other chipsets and processors as part of the project?

The design of our current architecture is very specific to the Xeon 5500 series. We've certainly looked at other chipsets, but haven't seen the same degree of performance from them. (See sidebar interview with Conor Allen, Head of R&D and Core Engineering at NYSE Technologies for more about the NYSE Technologies/Intel collaboration.)

Figure 1

Figure 1

What were the biggest challenges you encountered with the V5 project?

The mandate we had from clients was that speed was important obviously, but so was latency variance. For them, consistent low latency was better than having ultralow latency accompanied by performance spikes. The challenge for the development team was therefore to be consistently fast with a very high message throughput.

The second major challenge was to make V5 not only very fast and consistent, but also to be able to encapsulate it on a much smaller footprint. Overcoming this second challenge means that 'fuel consumption per trade' has dropped considerably. The commensurate reduction in physical hardware size is also important because clients deploying V5 many not have control over the location and contiguity of the footprint they occupy in a data centre or co-location facility. If you can pack in more performance per U of rack space, the chances of obtaining a contiguous footprint and optimal performance are improved.

Stanley Young

How does Superfeed fit into your overall data picture?

Superfeed is data as a service delivered as an aggregated feed that sits either in our data centre or one of our points of presence. We take our own data and aggregate that with data from other trading venues (so we are effectively also acting as the vendor of record for other data sources). We provide this as a service via 10Gbit Ethernet in our co-location facilities to those market participants who do not wish to build their own market data platforms.

The reason we are doing this is because we believe the trading world is moving towards the point where the majority of market participants will wish to co-locate in some shape or form. Whether they do this through a bureau or their own hardware, we believe there will be a spectrum of demand from high-level trading technology companies all the way through to those requiring a straightforward compute on demand infrastructure.

Running Superfeed from the data centre means that we will be able to consolidate the feeds from multiple sources to one point and pass that to client's execution engines extremely quickly. We would certainly expect this to be both faster and more efficient than participants consuming and routing the data to the data centre themselves. This distinction will be particularly apparent if MTFs and dark pools choose to co-locate with us.

What about historical data?

We will certainly be offering this as well; once we have consolidated the various feeds into Superfeed we can simply pipe that to a tick database, which will allow access to ticks all individually flagged as to source. We already have a partnership with OneMarketData in place in this regard, so participants will be able to conduct back testing of their algorithms in near real time.

What is your general philosophy for technology development? Build or buy?

If you want to run world-class markets you need world-class technology. We have taken the view that to have that world-class technology you really need people internally capable of building it. Obviously we will bring in external components where we don't believe we can be best of breed, but if you can just buy the whole thing off the shelf then anyone can do it. Where's the competitive edge in that?

Particularly in areas such as matching engines and data fabric we believe we have a significant edge. We invest heavily in technology, which is why we set up NYSE Technologies as a separate entity to sell a range of technology to broker-dealers, fund managers and other exchanges. That can be anything from market data platforms, to exchange gateways, to even matching engines.

How many personnel in the technology company?

Overall, the commercial part of the group employs around 500 people, with deployment sites in Paris, London, Belfast and New York and some smaller groups in Florida and Chicago.

Are you a C++ or Java shop?

We are mostly C++. We do have some Java products but we find that for the types of processing that we do (and especially for the low latency technology) it is really C++. We obviously use Java on the front-end, but most of our messaging, exchange engines and gateways are C++.

Intel Xeon ChipIntel's Xeon 5500 series

why should you care?

Intel's Xeon 5500 series ushers in a number of innovations that have particular relevance to auto/algo traders keen to minimise latency. Probably the most obvious of these is Quick Path Interconnect (QPI).

For quite a while now Intel has used the Front Side Bus (FSB) as the system interconnect linking the processor core(s) and the chipset containing the memory controller and the I/O bus. The snag is that this arrangement created a choke point, as all memory was in a single shared location.

By contrast, the new architecture provides each processor with its own exclusive memory. The X5500 series processors use the QPI for rapid connectivity to the I/O controller and there is also a QPI link between dual-processor sockets. Total bandwidth can be up to 25 Gigabytes/second, which Intel claims as a 300% improvement on other current interconnect solutions.

When combined with I/O techniques - such as those used by NYSE Technologies that bypass the system OS and allow data to be moved directly between the memory of separate machines - trading latency can be massively reduced. Furthermore, the ability to move data directly between the dedicated memory of processors in the same box also facilitates the type of "all on one machine" trading Conor Allen refers to in his interview.

Another potential trading benefit derives from Intel's Turbo Boost technology. This uses a controller on each processor to track the load of each core and power it down if it is being under-utilised. The capacity released in this way can be transferred to other cores, allowing their speed to be increased in 133MHz increments. This is particular relevant in a trading environment where sudden bursts of data from one market (say in response to an announcement) create high demand. If another core on the same machine is handling data from a relatively quiet market, the opportunity to "borrow" capacity from that second core could prove extremely valuable in minimising processing time for the active market.

Do you find it difficult to hire the right kind of C++ skills?

We have observed that the majority of young programmers now emerging are primarily familiar with Java. However, we have also discovered that it is easier to retrain somebody to program in C++ than it is to take somebody really good at C++ and train them in Java. I should also point out that the team working on the most demanding areas such as low latency is extremely small. They are all people who have been with us a long time with long experience in areas such as building matching engines and data fabric.

Do you find that instilling a market mindset in novice programmers is a big challenge?

We take a lot of graduates into Belfast which is where our Wombat acquisition was based. What we have discovered is there is a very strong commercial culture there anyway, so it's not as if somebody is joining an IT department of a bank. Very much from the outset the premise is that we are a commercial IT company that lives or dies by its results. Graduates joining us very quickly understand they have to meet more than just software performance targets, but also product revenue targets.

Operating system (OS) and processor preferences?

As regards OS, we are Red Hat Linux throughout, all running on Intel.

Intel has been an exceptional strategic partner for us for a very long time. Our activities tend to be on the bleeding edge of the industry, so techniques we are working on today are likely to filter through to the mainstream processor market two or three years down the line. The collaboration with Intel means that we each have a strong insight into each other's R & D, to mutual benefit.

What's your strategy as regards co-location?

We are currently constructing our own purpose-built ultralow latency facilities that we refer to as liquidity centres. At present we have data centres in Paris and London for Europe markets and in Chicago and New York for the US. We intend to consolidate these into new centres in London and New Jersey.

The centre in New Jersey is on a 27 acre site and will include around 100,000 ft.² of raised floor. As the CIO of NYSE I will need about 20% of this space for our own use, such as hosting space for exchange matching engines. The remaining 80% will be available to third parties for co-location and proximity hosting. The intention is that the facilities will be available to multilateral trading facilities and dark pools, should they wish to have a presence there.

Including trading venues that directly compete with NYSE Euronext?

Yes; we take the view that markets should not compete on infrastructure. If we can save the industry money by co-locating in fewer places (and thereby reducing participants' connectivity costs) then that's all to the good.

We've had more than a few readers mention the problems they have with high density power availability in data centres on the US East Coast. Have you encountered this?

We certainly have in the New York area, both in our existing centres and in some third party space we use. For that reason, we will have our own substations at the new data centres. For example, our forthcoming New Jersey facility will have two substations of its own on-site, as will the new London centre.

Another concern Automated Trader readers have been raising is that some markets are (unofficially) offering preferential market access. Some readers have claimed that in return for payment certain markets will give participants a quicker/shorter route to the matching engine in their facility. What's NYSE's position on this?

Our charging is purely based on consumption of rack space, absolutely nothing else. We might make more space available in a centre to an entity if it was a market maker providing significant liquidity across all our markets, but that is all. Everyone in our data centres has (and will continue to have) equal speed of access to the market.

In fact we believe that this whole question of preferential market access will in due course be subject to regulation in both the US and in Europe. We also feel that absolute clarity on this is essential; therefore we will be publishing our rules and methodology for space allocation in July to the whole industry so everyone will know the position.

Stanley Young

Is your intention to have one set of matching engines per market in the new facilities? Or are you considering a regionally distributed approach?

There will be one matching engine per market. We are moving our matching engines from Paris into the London facility so all of them across all instruments will be in one location, with the same approach applying in the USA. For example, in the UK we will have our regulated markets for France, Belgium, Portugal and Holland etc next to the LIFFE markets with the options and futures for those underlying instruments.

We also believe that the 'network effect' of having those engines very close to each other on the same infrastructure in the same location will lead to a massive increase in multiproduct, multi-legged strategies. We obviously can't measure that as yet but we are currently discussing this with a number of high frequency trading firms who definitely see this as an opportunity.

In the longer term there is also the possibility that it will make sense for each symbol to have all its derivative products on the same matching engine, rather than just in the same centre.

What's your view of the future?

I think we are a long way from claiming that the race for the lowest possible latency is over - it still has a way to run. We are talking now in terms of nanoseconds and picoseconds, just as we did microseconds and milliseconds three years ago. Optical switching is already a reality for us because photons are faster than electrons. Who knows? Perhaps a photon matching engine?

Another fascinating question is how the components of a liquidity centre will interact in a virtual world where you will have a combination of regulated markets, multilateral trading facilities, dark pools and buyside/sellside participants. How will that play out in terms of the whole matching paradigm?

Conor Allen

Perfect Match

Conor Allen, Head of R&D and Core Engineering at NYSE Technologies discusses the firm's relationship with Intel and its Xeon 5500 series processors.

How did the relationship with Intel evolve?

Wombat started working with Intel some two and a half years ago and the relationship continued to evolve after NYSE Euronext bought Wombat. Over the last eighteen months or so we've had access to engineering samples of new Intel chips for testing purposes. (For example, around a year ago we were given one of only five Nehalem engineering samples that were made available to EMEA.)

We also have access to the Intel high performance laboratory team in Russia and the opportunity to work with them has allowed us to gain a much deeper understanding of Intel's chip architectures and how best to optimise our software to take advantage of them.

For example?

We make extensive use of Remote Direct Memory Access (RDMA), which allows us to move data directly from the memory of one computer into that of another without involving either machine's OS. That is hugely more efficient than passing the data via the OS and is a technique we have used successfully with previous Intel technology. However, the close relationship with Intel has allowed us to gain an early understanding of the Quick Path Interconnect (QPI) introduced with the Xeon 5500 series - and that has allowed us to take performance to an entirely new level.

So would you say that QPI has been the major benefit for you?
QPI has certainly been hugely useful to us, but other features are also a good fit with the type of work we are doing. For example, the architecture's ability to monitor each processor core for a range of values (e.g. power consumption or temperature) and then use that information to allocate spare capacity to other cores (so they can increase their clock frequency) is well-suited to the non-deterministic 'bursty' world of market data.

Which version of the Nehalem-based processors are you using?

The X5570, which has a clock speed of 2.93GHz. We like the Xeon 5500 series because its I/O architecture fits very closely with the approach we are taking. A lot of the work we do is focused on getting data into and out of the processor. Intel's I/O hub is interesting because we don't use any processor bandwidth to get data out of the machine.

How important is increasing the number of processor cores per machine for you?

Scaling the number of cores horizontally within the same machine within the same power envelope would definitely be welcome. There are also certain trading strategies we are seeing people wanting to deploy where they are running the feed handlers, the algorithm and the market access gateway all on the same machine. More cores in that situation would allow greater trading capacity.