Get a Grid!
Published in Automated Trader Magazine Issue 07 October 2007
In the Q3 issue’s Technology Forum, our panel of experts agreed that grid computing had not yet been fully harnessed to support automated trading. So Automated Trader asked Mike Stoltz, VP, Architecture and Strategy, Financial Services at Gemstone Systems, Inc, to explain how grid computing might support a theoretical automated trading strategy.

Mike Stoltz
The Scenario
A large proprietary trading operation wishes to develop a global,
automated synthetic pairs trading programme across its offices in
New York, London and Singapore. In its first phase, the objective
is to identify baskets of instruments that can be traded as
synthetics against individual real securities. In its second
phase, the intention is to extend this strategy so that both legs
of the pairs trades consist of synthetic instruments.
The intention is that the strategy will involve multiple security
types denominated in various currencies and that it will also
operate across twenty different timeframes (ranging from one
minute to daily data). Combining these factors with the need to
conduct in- and out-of-sample testing for cointegration means
that the computational requirements will be very significant from
the outset, but will expand substantially with the second phase
of the project.
Furthermore, the trading operation is making the assumption that
the tradable pairs arrived at after 'out of sample' cointegration
testing will not be stable (i.e. the cointegration relationship
will probably decay), especially in short timeframes. Therefore,
the decision has been made that development and testing must be
continuous, so all calculations and testing on all possible pairs
and timeframes will be updated in real time. In the case of
shorter time frames, this will result in calculations having to
be performed during market hours. The intention is not to have a
'Big Bang' go live, but a gradual ramp up. Initial testing and
trading will only involve a subset of the final instrument
universe, but will still be conducted across all three locations.
Initial set-up
Due to the global nature of this business, it is likely that each of the three trading centres will have a replica of data from other centres. However, keeping multiple data centres in sync is a classic data-management problem. A new generation of distributed data caches, or 'enterprise data fabrics', are well suited to address this data management problem. By partitioning data according to well-defined rules pertinent to the company's business use-case, the data fabric will help provide transactionality for updates typically managed by regional resource managers. Inserts of new data do not require transaction management and the new records will only need a globally unique identifier.
Load balancing
Once the data is present at all locations through the data fabric, it is a simple task to fire off computations at any location by utilising computing resources in regions that would otherwise be idle. This is a question of 'moving the computations to the data' rather than 'moving the data to the computations'. Given the size and complexity of the kinds of data needed, the request for computation is an order of magnitude smaller than the data required to perform the calculation. Therefore, using a 'command pattern' can easily achieve a '10x speed-up' by treating the resources in each location as a computation pool and making data available to ensure all pools are of equal value to the compute grid software that is managing the load balancing.

Distributed Cache Topology
Resource allocation
Nearly every business problem lends itself to partitioning of some fashion and the allocation of resources to different elements of the calculation task is no exception. In this case, the obvious partitioning is the pairs themselves. Each compute resource can be assigned computation for a pair, or even a portion of the computation for a pair. Given that there is no cross-talk between the pairs, this is an 'embarrassingly parallel' computation opportunity. The only possible cross-talk is at the cointegration boundaries and the quantity of data required to represent it is significantly smaller than the data required for the overall computation.

Figure 2: Distributed data portioned between hardware elements
Network infrastructure
The grid computing solution must be supported by a network
infrastructure that can provide sufficient bandwidth across all
three locations as well as self-healing and redundancy
capabilities. Because today's networks provide up to 150-Mbit
bandwidth across wide area networks (WANs), the issue is latency,
not bandwidth. Latency is bounded by physical distance: signals
cannot travel between two locations faster than the speed of
light. So the important consideration is to use a transmission
protocol that batches data as efficiently as possible and reduces
the number of round-trips required to transmit that data.
Approaches such as sliding-window acknowledgement protocols,
batching and conflation can all improve the overall latency, but
there are trade-offs to consider. If the protocol holds back data
for too long to achieve an optimal batch size, latency will
actually grow. For this reason, a highly configurable, and
latency-sensitive data management protocol is required to achieve
optimal transport characteristics. Again, best-of-breed
enterprise data fabrics already have these features built in.
Server distribution and configuration
The task managers should be localised to each region to provide optimal management of nodes that are available and ready to work as well as localised scope for fail-over and reconfiguration of the compute backbone. Given that this is a data-driven computation, the data should be distributed to all of the compute pools using a combination of WAN gateway, peer-to-peer and client-server topologies. As the back bone of an enterprise data fabric, a peer-to-peer network is a virtual cluster of computers with equal ranking within the fabric that manage all data interactions. In the client-server topology, the compute grid engines are the clients and fetch their data from the peers as if they were a database. The WAN gateway is used to copy data to all of the separate data centres around the world. An appropriate mix of these three topologies will provide the highest resiliency with optimal throughput and lowest possible latency.

Figure 3: Distribution of computational tasks using client-server caching
Aggregating results
Aggregation and decision-making should be done using the earliest received results, regardless of which pool those results come from, to further reduce overall latency and increase the reliability of the computation. Even aggregation can be distributed to some extent, as long as the data fabric can present a consistent view of that aggregation to all members of the aggregation function, making even that function more resilient. If the aggregation step is arithmetically suitable or hierarchical, it may be possible to perform partial aggregations on portions of the results as they come along, and then take those partially aggregated results and do a final aggregation of them later. A hypothetical example of an arithmetically suitable aggregation might be an average of all of the results.

Distribution of computational tasks between compute grid and enterprise data fabric
Emergency scenarios
To achieve high levels of resiliency, it may be appropriate to
have more than one compute node subscribe to inputs for - and
calculate results of - the same computation. This will ensure
that the aggregation layer can easily make use of the first
results to execute ultimate trading decisions and discard the
extraneous results from the redundant nodes.
Utilising redundant compute nodes for each partition of the
calculation and discarding extraneous results means near-optimal
performance is retained even in the event that a large number of
the available computation nodes become disabled. The degree of
resiliency to failure can be customised based on the
applications' tolerance to failure. Individual node failures
totalling 25 per cent of the computing capacity of a given
location can be trivially managed by today's grid computing
technology. Even complete failure of more than one location can
be dealt with, given the appetite for 'N-way' redundancy of data
management and calculations.
Another approach that may be appropriate with some models is to
make the trading decision when 'just enough' results have come
in, rather than waiting for the entire computation to complete.
Depending on the model, this might mean trading as soon as the
first cointegrated pair has been identified in the first
timeframe and passed through out-of-sample testing or it may need
several results from several hypothetical scenarios before the
trading decision is made. This is completely model-dependent, but
the overriding principle is that it is possible to take advantage
of the fact that some computation nodes may be closer to the
aggregator, or that some portions of the computation may be
faster.
Extensibility
The most important consideration for expansion into the phase two
of this project is efficient distribution of data to all
geographies and all compute nodes within each location. With an
enterprise data grid in this kind of environment, one can
properly leverage the ability to make the right data available to
multiple locations consistently and coherently, while also being
free to manage the compute resources efficiently and provide the
desired resiliency, throughput and latency.
A computation can be scaled, partitioned and distributed highly
efficiently to achieve not only better throughput and latency,
but higher resiliency. By pairing grid technology and enterprise
data fabric, one can solve the classic data management problem of
synchronising multiple data centres.
Glossary
Aggregation layer - The software and hardware components that implement aggregation, i.e. the pulling together of the results of multiple separate calculations.
Compute resource - Typically a CPU or a core, but can be any device that can perform a computation.
Cross-talk - Where one calculation is either dependent on or affected by another calculation.
Conflation - In this context, conflation limits the amount of market data that needs to be propagated globally. Instead of sending every tick around the world, one might only send one tick per second for each pair.
Embarrassingly parallel - No essential dependency (or communication) between parallel computations being processed simultaneously on multiple computers.
Enterprise data fabric - A distributed operational data management infrastructure that sits between clustered application processes and back-end data sources, providing low latency data access, guaranteed availability of data regardless of the data source, extremely high data throughput, data sharing across multiple applications and scalable event and data distribution across multiple application nodes.
Globally unique identifier - A combination of the identity of the source of the data, a sequence number that is incremented each time that source is restarted and a simple sequence number that is guaranteed to be unique regardless of how many data sources there might be.
N-way redundancy - To be resilient to failure of two sites, data must be present at three sites. The value of 'N' is determined by the resiliency requirements of the individual organisation.
Sliding window acknowledgement protocols - Where one acknowledgement is sent for a batch of messages, to achieve better performance over a wide area network, rather than waiting for each acknowledgement before sending the next message.
Task manager - Generic term to identify the component within the infrastructure of a compute grid product that decides where a computation should be done.