Nowadays, to be the winner - and there are only consolation prizes for second place - traders are writing algorithms into FPGAs (Field Programmable Gate Arrays). Trades then rarely ever touch the server, the calculations are done in the network in under 10 microseconds.
For your primary HFT strategies this might all be true.To get trades in first, companies will shell out the massive budgets required to have algorithms embedded directly into network hardware. FPGA solution vendors such as Accelize, Enyx, Exablaze and Novasparks have created a networking arms race in these markets.
However, with traditional FPGAs - in switches or in network cards - this is an ongoing cost, which must be repeated every time users want to modify, update and improve algorithms to fit new trading strategies.
In many cases, keeping the trading strategy current, updating the software and underlying algorithms, is as important as tick-to-trade latency. So, constantly running off to your network vendor of choice for bespoke updates to an FPGA is not exactly practical - and it is certainly far removed from the way that many trading firms operate (think 'DevOps').
This leaves the 99% of trades that still happen on servers running software-based algorithms.Trading rooms will still invest significant sums into low latency networking infrastructure: switches, network adaptors (NICs), transceivers and even specialist cabling. Latency is still an important issue; but it is on a different order of magnitude. Typical tick-to-trade times are between 15-100 microseconds.
There has been a race to eliminate as much of the latency as possible on the path outside the server. The more traditional switch vendors, namely Cisco and Juniper, have seen pressure from new entrants like Arista, Fujitsu, Mellanox and Metamako, which have a strong focus on latency. Equally, standard networking cards for servers have been replaced by supercharged NICs from the likes of Chelsio, Solar are, Mellanox and CSPi with their Myricom product line.
THE REALITY SHIFT
These solutions can reduce network latency (i.e. traversing the network stack from Layer 1 to Layer 5) down from above 20 to below 2 microseconds. An impressive reduction. But despite this figure, the network stack represents only 10 percent of the total latency required to make a trading decision. With software- based algorithms, the vast bulk of the latency comes from processing time inside the server. And it is precisely this remaining larger chunk of latency that some vendors now have in their sights.
"We are using programmable FPGAs inside the networking infrastructure to begin to offload a bigger chunk of the 100 μs server latency, without jumping all the way into implementing trading fully inside an FPGA. That costs too much, is hard to change quickly, and limits the complexity of the trading algorithm," explains Craig Lund, General Manager, CSPi High Performance Products.
The Myricom line now uses its own FPGAs within its latest generation of network adaptor cards. These provide a cost effective way for the company to embed its decades of experience into creating NICs which compete with raw latency leaders like Chelsio and Solar. But within the Myricom products, not all of the FGPA chip's capacity for logic is used. There is some space left. Very valuable space.
CSPi is using this extra space to offload common trading tasks from the server into the networking cards. Trading applications run as normal on the server, users do not need to pay to write complete algorithms onto the FPGA. However, CSPi's network adapters are bringing increasingly large parts of the common trading functionality inside the chip and are implementing this as a standard library. This dramatically cuts into the server latency, at a fraction of the cost and at far lower risk than the traditional bespoke FPGA solutions.
"Our current adapters select specific packets and send them directly to a trading application, bypassing the operating system completely", Lund continues.
"Our FPGAs perform 4-way (radio and wire) market feed A/B arbitration invisibly in hardware. Importantly, they can also capture, transmit and receive timestamps in order to measure, monitor and improve the software latency."
THE REALITY CHECK
This approach has potential to turn the majority of the low latency trading market on its head. CSPi's view is that the actual latency of the NIC itself pales into insignificance against the potential savings in processing time on the server.This obviously isn't news as it is the whole point of building bespoke FPGA trading applications. However, what CSPi are doing is offering an increasingly broad set of ready-built, tried and tested low latency networking tools at a price that is relevant for any trading application.
With the bulk of custom code still on the server, trading software developers can continue to adapt and improve algorithms as often as necessary. This makes these new generation FPGA-based NICs the perfect tool for the modern, constantly evolving world of 'DevOps' and Continuous Integration. In-built nanosecond time stamping allows 'DevOps' teams to constantly monitor and improve the latency of server-side software changes.
But the real benefit is Myricom's library of FPGA-based trading functions which software developers can call on. This offers them the potential to rip 10's of microseconds of latency out of their applications. Success will depend on how quickly Myricom can build out this library and whether other vendors follow suit. And as always in finance, whether the industry manages to create competing standards around this concept.
"We believe that bringing a library of FPGA functionality to the 99% of applications that have been left out in the cold is going to be one of biggest networking changes trading has seen in a decade," concludes Lund. "Since FPGA designs can evolve after shipment, we plan to build out this library into a core toolkit for the trading community."