Dr Usman Malik
One of the most talked about topics in algorithmic trading is latency. Press releases from telecom firms, data providers and software vendors are typically focused on themes such as 'locate next to the exchange and reduce your latency to a millisecond or one million messages per second throughput'. While such figures sound impressive, it is misleading to suggest that any of these solutions will prove a panacea for all latency problems.
In engineering terms, the simple definition of latency is a time delay between the moment something is initiated and the moment its first effect begins. As there are many moving parts in algorithmic trading, this problem naturally arises in many stages of the process. What follows is a simple walkthrough of the main areas where this problem arises.
While most latency problems are measured in milliseconds, this is one that can often be measured in years. From the first meeting to discuss what an investment bank wants as an algorithmic trading solution to the actual roll-out of that solution to its clients will typically take many years. Hence, 'go live date' is by far the most important latency statistic in algorithmic trading services. Your own project might be held up on issues such as, should we be using exchange or consolidated feeds?, but your competitors will not be worrying about such marginal technicalities and will instead be selling the service, creating a brand and capturing commission. One should always have this opportunity cost in mind when deliberating about marginal latency issues.
The source of this latency lies behind the investment bank's firewall and is the delay caused by internal systems. This is often one of the most fruitful areas to explore when trying to reduce overall latency times as there is typically a lot of 'legacy' code doing the numbercrunching and sending orders to market. Systems that were built for cash traders in 1995 are not likely to perform well under today's loads. The same applies to data messaging software. However, many of these systems can be brought up to speed by a reassessment of the actual code, i.e. how much memory is being dedicated to each piece of software?; is it worth recompiling on the latest operating system?; and could we use an in-memory database as opposed to an existing solution? Internal software reviews that look into these kinds of issues nearly always produce performance improvements with little additional expense on new products.
"What matters most is that you are using the telecom carrier that offers the quickest route to the exchange, …"
When tackling hardware latency, most firms decide to purchase the latest scorching-hot CPUs (central processing units) with ever-increasing clock speeds.
However, while this approach might seem the most obvious, it is not always necessary. Also important are factors such as: how many disks are in the server?; what is the RAID configuration?; and how much cache is on the RAID controller? (RAID stands for 'redundant array of inexpensive (or latterly independent) disks' and is a hardware technique employed to reduce real-time problems caused by disk failure.) Simply adding more RAM to an existing server can produce cost-effective results. Also, existing CPUs with a large cache memory (the actual memory on the CPU) should not always be replaced with CPUs with a fast clock speed; the larger the cache memory, the less amount of time a computer needs to wait to get data from RAM. All of these in-depth hardware details can have an effect on the hardware latency of the algorithmic service. The investment bank's actual internal network can also be problematic. One obvious question is: how many switches and routers are involved in the route to the bank's market gateways? Each one of these hardware obstacles slows down the path of messages and increases the overall hardware latency inherent in the algorithmic trading platform. Finally, the communication speeds of these routers and switches need to be synchronised. If one switch is running at 100Mb, but there is another old switch still running at 10Mb, then there can be a dramatic increase in the latency from the internal network.
Exchange data latency continues to remain a topical subject. The user is faced with two choices: raw exchange data; or a consolidated feed from a data vendor.
While the raw feed will probably be quickest, there is an enormous technology overhead that comes with maintaining these feeds for each exchange, which could also add to the hardware latency. The consolidated feed is usually very easy to use and robust. However, this reliability also has latency implications caused by the aggregation and normalisation of exchange feeds at the data vendor's ticker plant. There is no definitive answer as to which is superior when robustness and overall total cost of ownership is factored in. If the algorithm can exploit the latency advantage from direct exchange feeds and cover the accompanying cost overheads, then there is clearly a competitive advantage. If not, then accepting the latency embedded in a consolidated feed is a sensible compromise.
This latency area covers two related topics: where do you locate your algorithmic servers?; and how do you communicate with the exchanges and your clients?
Communication latency is the time taken for a packet of data to be sent by an application, to travel, and be received by another application. Hence we are concerned with how fast the algorithmic service is sending/receiving orders and data. It makes sense that the nearer you are to the exchange the quicker your order will be received. However, this comes with an important caveat. What matters most is that you are using the telecom carrier that offers the quickest route to the exchange, as this should provide the lowest latency. This issue is more important than geographical location when considering location/colocation of the algorithmic servers. The best way to establish the quickest route is to use a traceroute (a computer network tool used to determine the route taken by packets across an IP network) that will actually show how many network hops are involved in the transmission of the message. It will also give the round trip time for a message on each specific route.
There are many different methods used to connect a client to an investment bank for the purpose of direct algorithmic trading/DMA. A dedicated leased line between the client and the investment bank is often the most robust and effective solution. This is usually the quickest route and bandwidth is substantially more generous that competing thirdparty networks at an equivalent price (bandwidth is the number of messages that can be sent/received per second). Another popular method is to use a thirdparty network, typically referred to as a hub and spoke solution. This involves the client connecting once to the network when the bank is also connected. While this has one obvious advantage (one connection to the network means you can connect to any other bank on the network), the big disadvantage is latency. The introduction of the third-party network means that the messages have to go through many additional switches and routers, which add to the overall latency. A typical time to communicate over these networks between a London-based client and a London-based investment bank can be up to 30 milliseconds. Another problem is that bandwidth is constrained and increasingly it comes at a substantial cost. This can be a severe latency penalty during busy periods on exchanges.
"Minimising latency certainly helps, but speed is not a metric for which one will be paid a bonus at the end of the year."
Perhaps the simplest connectivity method that can provide speeds close to (or as good as) a leased line is a virtual private network (VPN). This is a secure Internet connection between the client and the investment bank. If the client's servers are in a telecom provider's data centre, then the resulting VPN is as good as a leased line (this is typically referred to as a 'VPN over Internet' backbone). Speeds between London and Paris can be as little as five milliseconds. Moreover, the bandwidth is uncapped which gives the VPN a high burst rate, i.e. the capacity to send many messages per second if need be. As this solution only involves configuration on the existing firewalls, it is also the most cost-effective solution.
The bottom line
Ultimately, with any algorithmic solution, what matters most is results. Does the solution offer increases in execution performance and trading floor productivity? Is it enabling the bank to get more commission from programme trading and direct customer access to the firm's algorithms? Minimising latency certainly helps, but speed is not a metric for which one will be paid a bonus at the end of the year. Hence any investment in reducing latency has to have a quantifiable effect on expected turnover