The Gateway to Algorithmic and Automated Trading

Financial Programming Greatly Accelerated

Published in Automated Trader Magazine Issue 42 Q1 2017

FPGAs are about to become a lot more attractive as the technology of choice for cutting-edge application development. Pre-compiled numerical libraries and an integrated software stack, combined with a new class of closely-coupled silicon devices, are the drivers.


Olivier Cousin

Olivier Cousin is responsible for building libraries of FPGA accelerators in the High Level Synthesis Group at Intel. Prior to that, Olivier worked as a front office quantitative developer accelerating trading platforms at RBS. He holds a Masters in Digital Electronics and Parallel Processing Systems and a Certificate in Quantitative Finance from the CQF Institute.

Stephen Weston

Stephen Weston is a Principal Engineer and Libraries Architect in the Programmable Solutions Group at Intel. Stephen has over 25 years experience in investment banking in trading, quantitative research and risk management. He is also a visiting professor at Imperial College London and holds a PhD in Mathematical Finance.

Innovation has always been and will always be the lifeblood of financial markets. In recent years, the most important innovations have been technological. Massive increases in network bandwidth and hardware processor speeds have allowed an ever-growing proportion of trading to take place electronically rather than via voice (open outcry or telephone). This trend has been reinforced by regulatory changes that have encouraged centralised rather than bilateral clearing and exchange-based rather than over-the-counter (OTC) trading.

These developments made the financial markets increasingly competitive and latency-sensitive. It has become harder to develop and maintain a competitive edge in trading.

In this article, we first draw a qualitative picture of the current technology landscape. We then explore how numerical libraries for hardware, combined with an integrated software stack, offer one way to make efficient use of Field Programmable Gate Arrays (FPGAs) to turn such challenges into opportunities.

The technology Landscape

The rise of latency sensitivity

The blink of a human eye takes 300 milliseconds. Exchanges and other trading platforms now find themselves dealing with customers who want to execute a trade in under 100 nanoseconds - less than one 3,000,000th of a blink.

The big data tsunami

As technology has enabled more trading to become electronic, high-frequency trading has become mainstream. For liquidity providers, electronic market making is the norm and for liquidity takers, algorithmic execution is standard. Due to these changes, quotation volumes have skyrocketed - the US equity options market alone peaks at over 25 million quotes and trades per second, compared with less than 1.5 million in 2010. Such rapid change has imposed a huge technology burden on both buy-side and sell-side firms, as well as exchanges and clearinghouses. Massive volumes of data - prices, orders and trades - have to be cleaned, stored and maintained.

The quest for advanced AI

The combination of greater data availability and faster processors to analyse that data allows the use of increasingly sophisticated techniques such as machine learning and artificial intelligence (AI). Recent news stories have highlighted the uptake of these technologies at high profile investment management firms.

The coming of age of 'hyperliquidity'

One key development that has had a fundamental impact on many electronic markets is the trend towards so-called 'hyperliquidity'. This describes a situation where the velocity, volume and variety of digital data have pushed the transparency and efficiency of a market to a state at which they are at (or very close to) their highest possible levels. According to Belt and Boudier, who coined the term in 2016, there are three main forces pushing markets towards hyperliquidity, namely:

  • An increasing degree of standardisation via trading on exchanges using standardised contracts.
  • Greater transparency of information. This occurs due to digital dissemination at ever greater speeds, producing close to perfect information for all professional market participants.
  • The emergence of an advanced digital infrastructure, in particular, colocation facilities using high speed links. Other examples include market access via convenient and efficient APIs, efficient and transparent matching engines etc.

Many markets that are characterised by electronic trading - such as equities, foreign exchange, futures and some interest rate derivatives - have entered a de facto state of hyperliquidity. This has resulted in the dominance of trading platforms which has in turn led to a significantly reduced level of human intervention. Decision-making is increasingly initiated and managed by algorithms fed by high-speed, automated digital data feeds. Traded volume is dominated by market makers that use algorithm-based trading capabilities of varying degrees of sophistication. Trading strategies rely on combinations of speed, low latency, optimised execution and relative value-driven cross-asset trading.

FPGAs in trading

Advantages of FPGAs

In trading, the speed, low latency and minimal jitter of FPGAs make them attractive for network applications, packet inspection and handling digital data feeds. Other key advantages of FPGAs are their capability to be totally or partially re-programmed on the fly - thus avoiding having to bring down a system every time a program needs to be changed. Finally, FPGAs are deterministic in their performance regardless of the load, which makes them especially useful in networks and market decoders/feed handlers.

Current limitations

FPGA usage in finance and trading has not spread far beyond the known applications in networks, packet inspection and digital feed handling due to three main reasons:

  • Most significantly, FPGA programming is difficult and many of the supporting tools have not reached a mature state. This has resulted in long lead times before arriving at a finished algorithm ready for deployment. On-going support and maintenance are also scarce.
  • Second, there is a lack of an integrated software infrastructure for FPGAs that would enable CPU-familiar software engineers to easily migrate and integrate their innovations into existing software stacks.
  • Third, there is almost a total lack of libraries of pre-compiled and optimised numerical functions needed to quickly make a prototype for any new and potentially complex trading application. This is in full contrast with CPUs, where both native libraries (such as math.h) and advanced C/C++ libraries (such as Intel® MKL, Math Kernel Library and IPP, Intel Performance Primitives) exist. These enable fast R&D followed by an equally fast deployment of new algorithms and strategies.

Recent advancements in FPGA hardware and software are beginning to address these issues directly.

Developments in the FPGA sector

The first major developments are new CPU+FPGA devices that are now reaching the market in volume and enable tighter and more efficient coupling of the two technologies. As data is written directly into the processor cache, bypassing inefficient transactions via the standard PCIe interface, these devices provide access to higher performance and flexibility. They have been well discussed in the literature and are not the focus of this article.

Second, pre-compiled libraries of numerical functions for FPGAs are now available. Access to these libraries is simplified by an accompanying new software infrastructure enabling communication between the FPGA and the CPU running the main application. This enables developers to access the power and flexibility of a FPGA with minimal knowledge of how FPGAs work. To the authors' knowledge, this is the first extension of FPGA functionality aimed at the CPU software developer and is the subject of the remainder of this article.

Financial libraries for FPGAs

Adapted software infrastructure

Before discussing the numerical libraries for FPGA, let us first examine the entire accompanying software stack. This has been developed to provide simple and efficient access to the FPGA libraries directly from a standard C/C++ main.cpp. Figure 01 illustrates how to orchestrate the accelerators, manage the data and provide a standard interface to allow portability. The aim is to deliver convenient and fast FPGA access to the CPU software programmer, without the need to develop any FPGA code. The stack is designed to provide access to FPGA numerical functions and algorithms at several levels:

  • At the highest level, as pre-compiled library functions in a C/C++ main.cpp.
  • In the middle, as pre-compiled library functions directly accessible in an OpenCL program, which can be built upon for rapid prototyping and application delivery.
  • At the lowest level, as pre-compiled kernels, which can be used in combination with other kernels to build further libraries.
Figure 01: The software stack

Figure 01: The software stack

FPGA numerical libraries

Now in its first release, FinLib for OpenCL contains accelerated option pricing functions in the newly created FinLib for FPGA, which covers around 90% of exchange-traded options. FinLib can execute 3.2 billion option calculations per second using approximately 40% of an Arria10 1150 GX FPGA running at 300MHz, with two Black-Scholes engines per DDR4 interface and four DDR4 interfaces. FinLib can also be used to generate five risk sensitivities for each option at the same time as the option price, providing a major performance advantage for users wishing to monitor the risk as well as the value of an options portfolio or strategy. Using only Black-Scholes engine / DDR4 interface uses only 32% of the FPGA resources (which means less engines and more calculations). Automatic topology generation provides standard systolic array style configurations to ensure optimum performance of library functions on the FPGA. Table 01 provides a list of the functions contained in the first release of FinLib. The functions are designed to work with streaming data, such as is provided by a direct digital data feed from an exchange.

When users wish to directly call the API, it is invoked with API calls that completely hide all FPGA functionality and OpenCL coding through the use of familiar C/C++ style syntax as shown in Listing 01.

Listing 01: Sample calls to the FinLib API

Listing 01: Sample calls to the FinLib API

The syntax has been designed to have a consistent look and feel to MKL, as indicated in the code snippets above and in Listing 02. Together, they illustrate a generic, high-level call to a FPGA-accelerated function.

Listing 02: Valuing a stream of options with input from a CSV file

Listing 02: Valuing a stream of options with input from a CSV file

More specifically, let us take the example of valuation and risk calculations for a standard Black-Scholes model. The main.cpp code required to value a stream of option contracts taking input from a standard CSV file is shown in Listing 02. Note that all that is required is the familiar C/C++ style syntax, with the final call to instantiate the accelerator. The next step is to push the data into the designated memory on the host CPU, where the required code is once again fairly standard looking C/C++ (Listing 03).

Listing 03: Pushing data onto host memory

Listing 03: Pushing data onto host memory

The final step is to run the accelerator, then collect the results and verify their correctness as required (Listing 04). The library has been designed with data centre, server-class hardware configurations in mind. It runs on Linux CentOS, having been built off Quartus 16.1., and uses the C++ abstraction layer to enable accelerator instantiation directly into the user's code. To make evaluation of the new libraries logistically as simple as possible, users can login using a remote connection to a dedicated server in the Intel HPC lab in Swindon, UK.

Listing 04: Collecting results

Listing 04: Collecting results

They can run the FinLib functions directly or, if preferred, use a static PCAP file to feed the accelerator locally using a Shuttle SH170R6.

Integrated machine learning appliance

Utilising the same software stack, automated topology generation, interface style and support infrastructure, users who wish to go a stage further and to begin developing machine learning algorithms for tasks such as stock selection would be able to make use of a new dedicated, pre-configured deep learning appliance which will be released shortly. This will enable researchers and quants to use existing deep learning models developed in popular frameworks such as Caffe and TensorFlow directly on an FPGA-enabled device. The aim is to configure and parameterise convolutional neural networks for deep learning and to run existing models within a day. This flexibility and responsiveness is required to bring machine learning into the same domain as traditional quant modelling approaches.


Integrated CPU+FPGA hardware combined with appropriate numerical libraries would enable a closer integration of accelerator development and would facilitate the transition of AI from the lab onto the electronic trading floor.

Further, libraries of numerical functions and algorithms for FPGAs, combined with an integrated software infrastructure, are set to deliver a level of ease of use previously only available for CPUs.

Moving forward, FinLib functionality is set to be upgraded with yield curve and default curve construction, Monte Carlo simulation and standard solvers for partial differential equations. Also planned for 2017 is StatLib, an FPGA-accelerated library of standard statistical functions.