The Gateway to Algorithmic and Automated Trading

Get a Grid!

Published in Automated Trader Magazine Issue 07 October 2007

In the Q3 issue’s Technology Forum, our panel of experts agreed that grid computing had not yet been fully harnessed to support automated trading. So Automated Trader asked Mike Stoltz, VP, Architecture and Strategy, Financial Services at Gemstone Systems, Inc, to explain how grid computing might support a theoretical automated trading strategy.

Mike Stoltz
Mike Stoltz

The Scenario

A large proprietary trading operation wishes to develop a global, automated synthetic pairs trading programme across its offices in New York, London and Singapore. In its first phase, the objective is to identify baskets of instruments that can be traded as synthetics against individual real securities. In its second phase, the intention is to extend this strategy so that both legs of the pairs trades consist of synthetic instruments.

The intention is that the strategy will involve multiple security types denominated in various currencies and that it will also operate across twenty different timeframes (ranging from one minute to daily data). Combining these factors with the need to conduct in- and out-of-sample testing for cointegration means that the computational requirements will be very significant from the outset, but will expand substantially with the second phase of the project.

Furthermore, the trading operation is making the assumption that the tradable pairs arrived at after 'out of sample' cointegration testing will not be stable (i.e. the cointegration relationship will probably decay), especially in short timeframes. Therefore, the decision has been made that development and testing must be continuous, so all calculations and testing on all possible pairs and timeframes will be updated in real time. In the case of shorter time frames, this will result in calculations having to be performed during market hours. The intention is not to have a 'Big Bang' go live, but a gradual ramp up. Initial testing and trading will only involve a subset of the final instrument universe, but will still be conducted across all three locations.

Initial set-up

Due to the global nature of this business, it is likely that each of the three trading centres will have a replica of data from other centres. However, keeping multiple data centres in sync is a classic data-management problem. A new generation of distributed data caches, or 'enterprise data fabrics', are well suited to address this data management problem. By partitioning data according to well-defined rules pertinent to the company's business use-case, the data fabric will help provide transactionality for updates typically managed by regional resource managers. Inserts of new data do not require transaction management and the new records will only need a globally unique identifier.

Load balancing

Once the data is present at all locations through the data fabric, it is a simple task to fire off computations at any location by utilising computing resources in regions that would otherwise be idle. This is a question of 'moving the computations to the data' rather than 'moving the data to the computations'. Given the size and complexity of the kinds of data needed, the request for computation is an order of magnitude smaller than the data required to perform the calculation. Therefore, using a 'command pattern' can easily achieve a '10x speed-up' by treating the resources in each location as a computation pool and making data available to ensure all pools are of equal value to the compute grid software that is managing the load balancing.

Figure 1: Distributed Cache Topology
Distributed Cache Topology

Resource allocation

Nearly every business problem lends itself to partitioning of some fashion and the allocation of resources to different elements of the calculation task is no exception. In this case, the obvious partitioning is the pairs themselves. Each compute resource can be assigned computation for a pair, or even a portion of the computation for a pair. Given that there is no cross-talk between the pairs, this is an 'embarrassingly parallel' computation opportunity. The only possible cross-talk is at the cointegration boundaries and the quantity of data required to represent it is significantly smaller than the data required for the overall computation.

Distributed data portioned between hardware elements
Figure 2: Distributed data portioned between hardware elements

Network infrastructure

The grid computing solution must be supported by a network infrastructure that can provide sufficient bandwidth across all three locations as well as self-healing and redundancy capabilities. Because today's networks provide up to 150-Mbit bandwidth across wide area networks (WANs), the issue is latency, not bandwidth. Latency is bounded by physical distance: signals cannot travel between two locations faster than the speed of light. So the important consideration is to use a transmission protocol that batches data as efficiently as possible and reduces the number of round-trips required to transmit that data.

Approaches such as sliding-window acknowledgement protocols, batching and conflation can all improve the overall latency, but there are trade-offs to consider. If the protocol holds back data for too long to achieve an optimal batch size, latency will actually grow. For this reason, a highly configurable, and latency-sensitive data management protocol is required to achieve optimal transport characteristics. Again, best-of-breed enterprise data fabrics already have these features built in.

Server distribution and configuration

The task managers should be localised to each region to provide optimal management of nodes that are available and ready to work as well as localised scope for fail-over and reconfiguration of the compute backbone. Given that this is a data-driven computation, the data should be distributed to all of the compute pools using a combination of WAN gateway, peer-to-peer and client-server topologies. As the back bone of an enterprise data fabric, a peer-to-peer network is a virtual cluster of computers with equal ranking within the fabric that manage all data interactions. In the client-server topology, the compute grid engines are the clients and fetch their data from the peers as if they were a database. The WAN gateway is used to copy data to all of the separate data centres around the world. An appropriate mix of these three topologies will provide the highest resiliency with optimal throughput and lowest possible latency.

Distribution of computational tasks using client-server caching
Figure 3: Distribution of computational tasks using client-server caching

Aggregating results

Aggregation and decision-making should be done using the earliest received results, regardless of which pool those results come from, to further reduce overall latency and increase the reliability of the computation. Even aggregation can be distributed to some extent, as long as the data fabric can present a consistent view of that aggregation to all members of the aggregation function, making even that function more resilient. If the aggregation step is arithmetically suitable or hierarchical, it may be possible to perform partial aggregations on portions of the results as they come along, and then take those partially aggregated results and do a final aggregation of them later. A hypothetical example of an arithmetically suitable aggregation might be an average of all of the results.

Distribution of computational tasks between compute grid and enterprise data fabric’
Distribution of computational tasks between compute grid and enterprise data fabric

Emergency scenarios

To achieve high levels of resiliency, it may be appropriate to have more than one compute node subscribe to inputs for - and calculate results of - the same computation. This will ensure that the aggregation layer can easily make use of the first results to execute ultimate trading decisions and discard the extraneous results from the redundant nodes.

Utilising redundant compute nodes for each partition of the calculation and discarding extraneous results means near-optimal performance is retained even in the event that a large number of the available computation nodes become disabled. The degree of resiliency to failure can be customised based on the applications' tolerance to failure. Individual node failures totalling 25 per cent of the computing capacity of a given location can be trivially managed by today's grid computing technology. Even complete failure of more than one location can be dealt with, given the appetite for 'N-way' redundancy of data management and calculations.

Another approach that may be appropriate with some models is to make the trading decision when 'just enough' results have come in, rather than waiting for the entire computation to complete. Depending on the model, this might mean trading as soon as the first cointegrated pair has been identified in the first timeframe and passed through out-of-sample testing or it may need several results from several hypothetical scenarios before the trading decision is made. This is completely model-dependent, but the overriding principle is that it is possible to take advantage of the fact that some computation nodes may be closer to the aggregator, or that some portions of the computation may be faster.


The most important consideration for expansion into the phase two of this project is efficient distribution of data to all geographies and all compute nodes within each location. With an enterprise data grid in this kind of environment, one can properly leverage the ability to make the right data available to multiple locations consistently and coherently, while also being free to manage the compute resources efficiently and provide the desired resiliency, throughput and latency.

A computation can be scaled, partitioned and distributed highly efficiently to achieve not only better throughput and latency, but higher resiliency. By pairing grid technology and enterprise data fabric, one can solve the classic data management problem of synchronising multiple data centres.


Aggregation layer - The software and hardware components that implement aggregation, i.e. the pulling together of the results of multiple separate calculations.

Compute resource - Typically a CPU or a core, but can be any device that can perform a computation.

Cross-talk - Where one calculation is either dependent on or affected by another calculation.

Conflation - In this context, conflation limits the amount of market data that needs to be propagated globally. Instead of sending every tick around the world, one might only send one tick per second for each pair.

Embarrassingly parallel - No essential dependency (or communication) between parallel computations being processed simultaneously on multiple computers.

Enterprise data fabric - A distributed operational data management infrastructure that sits between clustered application processes and back-end data sources, providing low latency data access, guaranteed availability of data regardless of the data source, extremely high data throughput, data sharing across multiple applications and scalable event and data distribution across multiple application nodes.

Globally unique identifier - A combination of the identity of the source of the data, a sequence number that is incremented each time that source is restarted and a simple sequence number that is guaranteed to be unique regardless of how many data sources there might be.

N-way redundancy - To be resilient to failure of two sites, data must be present at three sites. The value of 'N' is determined by the resiliency requirements of the individual organisation.

Sliding window acknowledgement protocols - Where one acknowledgement is sent for a batch of messages, to achieve better performance over a wide area network, rather than waiting for each acknowledgement before sending the next message.

Task manager - Generic term to identify the component within the infrastructure of a compute grid product that decides where a computation should be done.