Hard and Fast?
Published in Automated Trader Magazine Issue 12 Q1 2009
This is an extended version of the Tech Forum that appeared in the Q1 2008 edition of Automated Trader. It includes an additional interviewee and expanded answers from all interviewees on the latest techniques for hardware and networking infrastructures.
- With:
- Pat Aughavin, Senior Business Development Director, Financial Services, AMD
- Vincent Berkhout, Client Engagement Director, COLT
- Michael Cooper, Head of Product Technology, BT Global Financial Services
- Andrew Graham, IT Architect, Financial Markets, IBM UK
- Shawn McAllister, Vice President, Architecture, Solace Systems
- Parm Sangha, Business Development Manager - Financial Services Industry Solutions, Cisco Systems
- Geno Valente, Vice President, Marketing and Sales, XtremeData, Inc
- Nigel Woodward, Head of Financial Services, Europe, Intel
What developments in processing capabilities should firms adopt to support algo/auto trading?
Aughavin: While many companies are investigating
parallel programming, they are proceeding methodically because it
can be difficult to maintain and support. However, companies
recognise the potential of accelerated computing and how it can
reduce power consumption and ease infrastructure complexity. In
recent months, select companies have launched accelerated
computing initiatives which are specifically designed to help
technology partners deliver open, flexible and scalable silicon
designs. These solutions can significantly boost performance in
compute-intensive applications. A key part of such solutions is a
stable platform which will help foster dynamic development,
enabling technological differentiation that is not economically
disruptive at a time when accelerated computing is moving to the
mainstream.
Cooper: Alternatives to traditional horizontal
and technology upgrade approaches are beginning to emerge that
address complex event processing (CEP), capacity and performance
requirements. Network-attached compute appliances seek to address
processing capacity and performance by offloading processing from
existing systems to an optimised appliance. Additionally, some of
these appliances mitigate the overheads frequently incurred with
platform and technology upgrades by minimising systems
modifications and application development. In addition to meeting
existing application performance requirements, these applications
can service multiple systems while providing significant
scalability and capacity for growth. They will also address other
issues like power consumption and cooling requirements, are
relatively straightforward to deploy and can prolong the life of
the existing systems estate. As a consequence they enable new
approaches to be developed and new functionality to be supported
that would not have been feasible on existing platforms.
Graham: The need to
analyse applications to ensure software is designed to exploit
multi-core/multi-threaded technology safely is ever more
important. A balanced solution stack must always be considered;
the old adage that fixing one bottleneck will only move it to
another part of the system still holds true. That said, emerging
technologies include:
• Offload engines/accelerators to perform XML
transformations, security processing acceleration, FIX/FIXML
acceleration, market data feeds optimisation, TCP offload engines
(TOEs) and hardware devices such as graphics processing units
(GPUs), field programmable gate arrays (FPGAs),
application-specific integrated circuits (ASICs) and cell
broadband engines;
• Accelerated migration of applications from 32 to 64 bit
hardware and operating systems, to exploit in-memory databases
and larger histories;
• Streaming/event-based technologies with the option to
perform more complex processing is gaining significant traction,
often blended with column orientation over row orientation;
• The need for predictive quality of service across the
architecture
• driving real-time java solutions, real-time extensions to
Linux, dedicated or highly-managed networks; and
• Daemonless low-latency middleware that exploits true
multicast networking.

Parm Sangha, Cisco Systems.
"The organisation needs to be able to monitor data and
message flow in order to determine where bottlenecks may be
occurring and what to do about them."
McAllister: Hardware infrastructure solutions combine the best of network and content-processing hardware advances to accelerate data delivery, routing and transformation in support of algorithmic and other applications. These solutions use FPGA, ASIC and network processor-based systems to move content processing into silicon which improves uptime and delivery rates by parallelising processing and eliminating the unpredictability of software on servers. With data volumes increasing exponentially and buy- and sell-side firms struggling with increased complexity, latency and unpredictability in their software infrastructures, hardware solutions can deliver an order of magnitude greater throughput while guaranteeing ultra-low, consistent latency.
Sangha: Improving processing capabilities
requires a combination of investing in faster processors and
making existing server investments work harder through server and
application virtualisation and intelligent routing of workload
between servers via a high performance trading (HPT) network. As
data traverses the different components of a trading platform -
including market data delivery, order routing and execution - the
HPT infrastructure should not only provide a lowest latency
interconnect at each component but, at the same time, allow
server CPUs to dedicate more capacity to the application, by
offloading networking traffic processing to the switching fabric.
Applications can also take advantage of multi-core CPUs when the
underlying operating system is designed to support virtualisation
and different processes are applied to different CPU cores.
Valente: CPUs are not getting faster and twice as many
cores do not make the system twice as fast. FPGAs are still
getting faster every generation and doubling in size every 18
months, so the performance gap is actually increasing in the
future. Unfortunately, there is a very fine line between the time
it takes to develop, test and deploy new algorithms, and the
latency/performance benefits that one can get with exotic
technologies. Tomorrow's successful approaches must be
multi-threaded in nature, but targeted to any technology at
compile time (i.e. quad-core now, to octal-core or FPGAs later).
New API layers are helping developers move from technology to
technology faster, regardless of the original or future target
platform.
Woodward: Processors are now available in
single-, dual- and multi-core versions with multi-sockets. Both
the operating system and the application software have to be
designed to take advantage of this processor layer. Also, various
acceleration technologies embedded in the hardware can increase
the performance of I/O-sensitive applications. Generally, the
focus should be on newer infrastructure technologies and tuning,
e.g. Ethernet networking in ten gigabits and Infiniband.

Geno Valente, XtremeData,Inc
"CPUs are not getting faster and twice as many cores do not make the system twice as fast."
How should firms handle both the increased diversity and sheer quantity of market data in algo/auto trading?
Cooper: Basic connectivity is
clearly an issue: more sources require more connectivity.
Frequent changes and systems test acceptance procedures
complicate the situation: for example, how do you validate new
data and test systems integration? Seeking to address the problem
through point-to-point connectivity will lead to scalability and
integration issues, high costs and inefficient use of
infrastructure, and is an inflexible approach that precludes
rapid adoption, testing and validation. Consumers of multiple
data sources need to identify service providers who can provide
connectivity (with the appropriate performance attributes) to
multiple sources and are able to provide flexible solutions that
support rapid connection to new sources, new applications and new
data services.
Graham: Consideration of an enterprise metadata
model for structured and unstructured data is important in terms
of management and exploitation by numerous applications and
people. Consolidation of feed adapters and exploitation of
multicast technologies will help alleviate some overheads.
Edge-of-domain performant technologies that filter the signal
from the noise would help too.
Data caching and data grid technologies that handle the
replication of data to many nodes are worth considering, feeding
in-memory databases and specialist compute engines. Systems are
being developed which are an execution platform for
user-developed applications that ingest, filter, analyse and
correlate potentially massive volumes of continuous data streams.
They support the composition of new applications in the form of
stream-processing graphs that can be created on the fly, mapped
to a variety of hardware configurations and adapted as requests
come and go, and as relative priorities shift. Systems are being
designed to acquire, analyse, interpret and organise continuous
streams on a single processing node, and scale to
high-performance clusters of hundreds of processing nodes in
order to extract knowledge and information from potentially
enormous volumes and varieties of continuous data streams.
McAllister: Hardware infrastructure can improve
throughput and deliver low, predictable latency allowing firms to
easily scale as data rates scale. Additional hardware can perform
tasks such as data inspection, filtering, routing,
transformation, compression and security to allow algorithms to
include market data, algorithmic news, research data and more at
rates 100 or more times faster than software and servers. Today,
firms typically have a separate infrastructure for their market
data, their reference data and their back office SOA-style
systems. Hardware solutions allow all classes of service to be
combined under a single API without impacting performance on the
highest end systems. Many parallel infrastructures can be
consolidated into a single infrastructure with better performance
and reliability than the stove-piped systems we have today.
Sangha: This is not just a matter of simply keeping pace but of competitive advantage. While processing power is important, it is vital that the network infrastructure is designed to support these ever-increasing volumes of data. Low-latency, high-throughput and deterministic behaviour are key attributes across the entire trading cycle. Consolidated information feeds provide key insight for algorithmic trading engines and human traders alike, but in the automated trading environment now common in equities and derivatives fast, direct feeds with latency measured in milliseconds and microseconds is key. Applications like market data feed-handlers need to be matched to the appropriate networking technology and built upon the most appropriate processor and operating system. Vendors such as Cisco, Reuters, Sun and Intel have joined forces to ensure an integrated approach is tested and ready to deploy. High performance 'cells' that are optimised for just such a demanding environment can sit alongside the standard infrastructure, where the 'need for speed' is less acute.
Woodward: Firms should look at new market data technologies, e.g. compression using the FIX FAST protocol or CEP engines to handle and analyse various data forms. Data caches such as Gemstone, Gigaspaces and Tangasol can store and feed real-time data to CEP engines. In addition, dashboard and analytical tools exist for analysing market data, such as Xenomorph and Kx Systems. New approaches to storage include offline archives from the major vendors such as EMC, NetApps and niche players such as Copan.
How are hardware/networking firms addressing the lead lag in
clients' purchasing cycles?
Berkhout: Early engagement ensures that integral
parts of network design, such as proximity requirements, can be
incorporated at the outset. For example, rather than connecting a
server farm directly to the nearest exchange, it could be
connected directly to the data centre hosting both the proximity
services and an exchange. This avoids unnecessary reroutes
through the metro network.
Graham: Each firm has its own buying cycle and
of course there will be a lag between procurement and deployment.
However, these lags are reducing and there is pressure on firms
to compete against each other using technology deployment as a
weapon of competitive edge. By providing a constantly-updating
stream of technology options, vendors can address the
requirements according to the individual firm's cycle.
McAllister: The best way for hardware-based
solution providers to deal with deployment lag is to provide a
modular industry or de facto standards-based solution that can
evolve. For example, a customer may deploy gigabit Ethernet or
Infiniband while planning to migrate towards 10 gigabit Ethernet.
The hardware solution must provide customers with a path to
deploy on one network infrastructure today, and adapt to changes
as they are adopted by each customer. Similarly, allowing
applications written to industry standard messaging APIs to run
unchanged over hardware instead of software will eliminate the
time-consuming step of retooling and retesting applications when
increasing capacity. Hardware products use a modular design so
that clients can choose whether they want messaging, persistence,
transformation or event-processing by simply installing new
blades into an expandable hardware chassis.
Sangha: I don't think that financial sector
companies are implementing out-of-date technology. Financial
services companies invest in IT that meets their needs. If their
need is for speed then they will be looking at the latest,
fastest processors and networking equipment. If speed is not
their primary need then they may well wait for new technology to
prove itself and become more affordable before investing. This
time lag doesn't make the technology out of date, rather that the
investment is fit for purpose.
Woodward: At Intel, we have opened our Low
Latency Lab to enable tuning and proof of concepts on the latest
technologies. This can fast-track adoption of the latest
technologies by enabling the testing of new combinations of
infrastructure elements. While this will not necessarily
accelerate adoption times, it gives a low-risk path to testing
the new technologies more quickly than environments can be
provisioned inside the firm.
How can firms optimise hardware deployment to overcome problems with power-to-rack ratios at popular co-location centres?
Aughavin: Virtualisation is one of the emerging
trends which firms are using to run multiple systems and
applications on the same server, the benefits being simpler
deployment, an elegant scalable architecture and a more efficient
use of computing resources. One approach is to deploy
architectures that circumvent both a front-side bus and
bottlenecks, enabling efficient partitioning and memory access to
and from the processing cores.
Berkhout: While some co-location centres have
been chosen as an immediate solution for low-latency trading, in
the medium term financial institutions will need to be more
selective over their chosen sites, locations and the power
options available to them. One solution is 'long lining', i.e.
dedicated connectivity to alternative locations nearby with more
power and, equally important, ample cooling.
Cooper: The adoption of new systems with lower
power usage and reduced cooling requirements is certainly one
approach. In addition, consideration should be given to the use
of appliances that provide functionality that can be used to
support multiple systems, e.g. compute, I/O and storage
appliances.
Graham: Mixing workloads within a rack will help
balance the distribution of power input/heat output within a rack
and across the data centre - often this requires organisational
change since workloads are often arranged by line of business.
New techniques including robot-assisted approaches can model and
analyse the thermal distribution within a data centre to enable a
more informed distribution of workloads. Blade-based solutions
with more efficient power supply arrangements over rack-mounted
servers are also worth considering; tie this in with active power
management software that controls the hardware in real time in
response to workload demands. To raise utilisation rates and
hence attempt to reduce the overall data centre footprint,
virtualisation technologies have a place for some workloads,
whether through hardware or software hypervisor implementations.
For 'hotspots', rear-door water cooling can help address the
cooling dimension, with 50 per cent of rear-door heat being
removed through the low pressure system in the door.
McAllister: A single redundant pair of hardware-based
nodes in a content infrastructure can handle the workload of
20-60 equivalent general purpose servers running software
middleware. Software solutions have widely variable latency
characteristics as volumes increase, leading many firms to deploy
infrastructures that can handle five or more times market peaks.
This can literally mean hundreds or thousands of servers that are
very lightly loaded under normal market conditions, just to
assure reasonable performance during trading spikes. Hardware
solutions can handle many more connections, greater throughput
and perform consistently as loading limits are approached.
Additionally, a single hardware infrastructure can combine the
requirements of low-latency middleware and persistent messaging
for tasks such as order routing without impact on performance.
Sangha: Virtualisation is delivering power
consumption as well as performance benefits. Driving existing
hardware harder through virtualisation reduces power consumption
by reducing the need to turn on or invest in extra servers to
meet service level agreements during trading spikes. Firms can
benefit from a reduced footprint in their data centres, as well
as reduced power and HVAC (heating ventilation & air
conditioning).
Woodward: The energy consumption of processors is
reducing dramatically. Firms should pressurise their co-location
vendors on the range of technical configurations being offered
and data centre design to ensure optimum service. They should
also tune their applications at the time of relocation, making
sure performance is optimised, rather than simply re-hosting
existing software. For example, simple use of processor-based
acceleration has been proved to improve performance of FIX
throughput, and code optimisation is usually expected to
contribute at least five per cent gains, reflected in both speed
and scale of business process and energy consumption.

Pat Aughavin, AMD
"Virtualisation is one of the emerging trends which firms are using to run multiple systems and applications on the same server, …"
Is parallel processing fundamental to facilitating algo/auto trading? If so, what technologies should firms deploy?
Aughavin: Yes, and many of the technologies that
companies should be looking to deploy, such as FPGA, GPGPU1, and
multi-core processing are being evaluated. From a GPU
perspective, the bottlenecks are mainly in data transfer and
latency rather than computation. Until recently, GPU products
were not suitable for data-intensive tasks. However, with newer
hardware the latency (i.e. transfer to GPU and back) can be
mostly hidden, meaning that the acceleration from GPU-based
searches and other algorithms will become more apparent very
soon. The GPU will excel at more computationally-intensive tasks.
As software tools and programming standards emerge GPU-based
applications will grow in number and quality, and the cost of
adoption should shrink further.
Graham: Yes, the runway for relying on
single-threaded speed jumps within general purpose CPUs is
running out, driving the need to consider parallel models to
exploit multi-core (and multi-threaded) technologies and/or
specialist hardware devices. All these devices typically offer
higher performance for their chip area, consume much less power
per computation than general purpose processors and are highly
efficient in addressing a narrow range of tasks. However, most
are expensive to programme because the skills needed are rare,
they lack mature application development tooling and they have
extremely limited ISV support. Parallel technologies means
consideration must be made of event determinism, race conditions
and thread-safe applications too. And with specialist hardware
solutions there is always the need to balance the cost of
implementation and management of exotic technologies over more
general purpose solutions. The gaming industry is also driving
innovation, so the technology it is considering should be
investigated - such technology also has good economies of scale.
Gaming is also driving changes to the Linux kernel that can be
exploited in financial applications.
McAllister: As chip-manufacturing technology has
reached the point where clock rates within a given execution
context can no longer easily be increased, the only way to solve
a problem faster is to break it into smaller problems and solve
each in parallel. If implemented in the right technology,
parallel processing allows increasingly complex algorithms to be
performed without latency penalty. With FPGAs, for example,
creating new parallel execution contexts to handle additional
work is a very natural thing to do. However, with parallel
processing comes the need for inter-context communication and
synchronisation. This is where software solutions based on
general purpose CPUs continue to be challenged in high-end
performance requirements, since much more time needs to be spent
on scheduling, synchronisation and event management. FPGAs and
network processors have integrated, purpose-built hardware
assists to deal with such functions which have made them a
popular choice in networking layers for years.
Valente: Exotic, market-specific technologies
will have a short-term niche, but are usually beaten by next
generation x86 CPUs or FPGAs. Both of these technologies are at
the forefront of the process curve (45 and 65nm) and can be
leveraged in many different markets to keep volumes up and costs
down. Trading companies need to make sure that their investment
is warranted and has longevity. FPGAs and x86 CPUs offer both.
Woodward: Not necessarily. FPGAs are specialist
proprietary technologies which can be used for specific workloads
with high results. Is it possible to support all applications on
these? It's unlikely, due to the cost of redevelopment. More
likely, niche functions will run on FPGAs, while mainstream
functions will run on ever faster core processors. In the future,
we will likely see development environments in which code is
developed to be deployed on the appropriate infrastructure -
removing the dedicated tight coupling that exists today.
How should buy-side firms that use in-house automated trading
models and algorithms optimise their hardware infrastructure to
achieve low latency?
Cooper: For consistently achieving the best
possible end-to-end latency, there are two key themes: (i)
achieving the best possible results for data forwarding and
processing in terms of absolute latency; and (ii) achieving
results consistently with variance only as a consequence of
inherent variables, e.g. packet size variation and elements
outside of the buy-side firm's control. The process of
optimisation is iterative and needs to be founded on fundamental
data identifying different domains (network and systems for
example). In principle, optimal latency can be obtained through
the reduction of introduced delay (in terms of end-point to
end-point device and component forwarding), transit component
optimisation (ensuring devices have sufficient resources to
process data with no delay, e.g. network switching without
queuing) and addressing sources of variance. Ultimately, each
component must be optimised for the lowest possible latency, but
invariably there will be components for which optimisation is
uneconomic or not feasible.
McAllister: FPGA and ASIC-based messaging is
specifically designed to eliminate common infrastructure latency
problems created by operating system garbage collection and
context switching, thereby providing consistent ultra-low latency
even at the most demanding peaks. Furthermore, hardware can
optionally perform rules-based analysis of complex content to
expand the range of algorithms beyond simple price activity to
include news, market alerts and research notes. Hardware content
infrastructure offloads CPU-intensive tasks - such as content
filtering, routing and transformation from the algorithmic
applications - so it can focus on performing its own unique
algorithms faster.
Sangha: The biggest challenge for companies
running operations in house is understanding latency across the
whole trading cycle - applications, servers, network and multiple
venues. The organisation needs to be able to monitor data and
message flow in order to determine where bottlenecks may be
occurring and what to do about them. Buy-side firms need metrics
to distinguish the levels of service that sell-side firms and
venues are really providing. By monitoring and measuring latency
early in the cycle, firms can make better decisions about which
network service and which market, intermediary or counterparty to
select for routing trade orders.
Valente: Typically, protocol handlers and
streaming databases are large contributors to latency. Solutions
that target these spaces should offer minimal disruption, like a
full-blown FIX or FIXFAST offload engine running in an in-socket
accelerator or PCI-express card. This would allow for
acceleration of the trader's existing infrastructure, without
having to change everything.
Woodward: Probably by using packaged offerings
tailored to their size and business strategies. Firms that run
stat arb strategies will have higher tech requirements, but less
performance-sensitive firms can get tech-enabled at a lower cost.
Traditionally, buy-side vendors are not leading technology
exponents and as such buy-side firms might not get, or be able to
afford, the best advice. If one wants the best one must scour the
showrooms and do the necessary research.

Nigel Woodward, Intel
"Firms should pressurise colocation vendors on the range of technical configurations being offered …"
What hardware-based strategies should buy-side firms adopt to store and access execution data as effectively as possible?
Graham: Investment in performant and reliable
data technologies is imperative. Small area networks (SANs) are
more competitively priced nowadays and storage is always getting
cheaper, so the associated benefits in performance, reliability,
thin provisioning, storage virtualisation, remote backup and
disaster recovery options are worth investing in. The largest
bottleneck may be the current computational paradigm where data
is created, stored and then analysed. A way forward may be to
create and analyse data, but then only store a subset. Shifts
such as this may well play a key role in shaping sophisticated
analytical environments to come, where real-time data mining can
play an increasing role in analytics, risk management and trade
execution.
McAllister: Storage technology evolution has
reduced the costs and improved the availability of massive
volumes of data. While SAN-based storage is a popular choice for
high availability and rapid data lookup, you will generally find
a wide spectrum of architectures in buy-side firms. Customer
choices are influenced by which configuration works best with the
database or data-caching solution at the layer above. For
high-speed assured data movement (for example order routing),
persisting to physical disk media creates many latency and
throughput challenges that have left transactional data rates far
behind the messaging rates of non-persistent applications like
market data. Hardware specifically designed to provide assured
delivery is finally unlocking these limits with 100 per cent
failsafe architectures that are not slowed down to the rate of
the fastest disk write. In-transit messages are persisted to dual
redundant caches and only the data that requires long-term
storage is written to disk. Battery-backed RAM ensures that
memory-cached messages cannot be lost, even in the event of a
power loss to the redundant pair.
Sangha: No matter what storage technology companies are investing in - and we would recommend making storage an integral part of the network with storage area networks - it is unlikely that they will be able to store all their information in one place, with information spread across trading partners and exchanges. Financial institutions need to be able to bring all this information together in a timely and efficient way which is where storage virtualisation provides a way of mapping and managing storage across the enterprise and third parties, while also maximising usage of existing, internal storage resources.
Valente: Analysing and searching terabytes of data is complicated and slow, especially when table joins, sorts and groups are involved. Storing data is simple - using, analysing and retrieving it is a different story. In addition to just storing it, standard database languages like SQL need to be supported and it needs to work on commodity hardware, so the IT professionals will support it. Specific storage/SQL appliances are starting to show up in the market place. These are hardware accelerated very large database (VLDB) appliances that can process SQL queries at a rate of 1TB of data in about a minute.

Andrew Graham, IBM
"… the runway for relying on single-threaded speed jumps within general purpose CPUs is running out, …"
Are the sell side's biggest hardware challenges on the processing side or on the networking side?
Berkhout: Having initially concentrated on
processing power and system optimisation, many firms have
recently realised that further latency reductions can be achieved
only if they focus on the network as well. To get a good
understanding of latency introduced by the network layer,
consider three prime contributors to latency:
• Serialisation delay converts information to packets or bit
stream - restricted by packet size and available bandwidth -
which can lead to buffering. To eradicate excessive buffering, we
recommend sufficiently dimensioned end-equipment and bandwidth;
• Switching delay is caused by hops across the network and
the processing power of routers and switches. This delay is
inherent to packet-switched networks and latency can be improved
through labelled path switching. The preferred option would be
connection-oriented networks close to the optical or transmission
layer, avoiding switching altogether;
• Propagation delay is virtually constant in optical
networks. It is determined by the speed of light, the refractive
index of glass (i.e. resistance) and a linear function of the
distance travelled. One way of ensuring the shortest physical
routes is for metro fibres to be spliced directly between end
sites and not routed through (multiple) exchanges. A proximity
solution would virtually eliminate propagation delay for
market-makers focused on a single exchange, but can present
challenges for cross-asset or multi-market arbitrage models.
Cooper: Both represent significant challenges
and organisations will have different priorities. On the network
side (i) market structure and evolving trading models present
challenges in terms of increases in the number of destinations,
integration and shorter lead-time-to-connect expectations, while
(ii) network performance continues to be the focus of attention
in the context of scalability and increasingly rigorous latency
expectations. Consolidation of connectivity requirements through
service providers that offer multiple connections across a single
communications infrastructure should address some of the
flexibility requirements. Network performance - in terms of low
latency criteria, the end-to-end deterministic forwarding in
microseconds with little or no variance - requires precise
engineering of all components on all forwarding paths.
The problem is compounded by the need to instrument the network
at a commensurate level of granularity in order to capture and
understand network behaviour - a requirement that introduces new
products, requires the development of new skills and supporting
practices. Different approaches include over-provisioning of
bandwidth, the adoption and utilisation of very high-performance
devices and systems, an iterative analysis and optimisation cycle
utilising information from a developing set of groups and bodies
providing specialised benchmarking and development services.

Vincent Berkhout, COLT
"… further latency reductions can be achieved only if they focus on the network as well."
Graham: With data volumes and the number of data
sources ever increasing, the full architectural stack will be
under pressure. Some implementations are currently using lower
network bandwidth (100Mb links) as a throttle to enable
processing to occur more reliably elsewhere - this then impacts
the competitiveness of the overall platform. The complete
architecture has to be considered to determine whether a
scale-out design could be used for example to enable consumption
of full data volumes. By filtering the signal from the noise
earlier in the data lifecycle, it may be possible to consume the
full data channel and process it in real time. Newer streaming
frameworks allow filtering, aggregation, correlation and
enrichment that can scale to thousands of individual physical
processing elements, effectively using low-latency multicast
technologies to distribute the problem.
New middleware can provide a high-throughput, low-latency
transport fabric designed for one-to-many data delivery,
many-to-many data, or point-to-point exchange in a
publish/subscribe fashion. This technology exploits the physical
IP multicast infrastructure to ensure scalable resource
conservation and timely information distribution.

Michael Cooper, BT Global Financial Services
"Network performance … requires precise engineering
of all components on all forwarding paths."
McAllister: In high-performance distributed applications, the challenges are a combination of both. Hardware-based middleware solutions use FPGA, ASIC and network processor technology to perform sophisticated processing at extremely high message rates with very low, predictable latency. For example, sophisticated, hardware-based content routing ensures that only the content of interest is sent to a given application. This in turn reduces both bandwidth demands in the network as well as processing demands by the subscriber. Such sophisticated routing also removes the need for publishing applications to perform content routing, or to deploy special content routing add-on software services that need to be managed and scaled independently. Hardware solutions can also transform data into a format convenient for the receiving application, again reducing processing demands. TCP offload engines along with zero-copy APIs further offload communication processing from host CPUs to purpose-built hardware, which both increases networking performance and increases CPU cycles available to the application.
Sangha: It's about having complete visibility of
processing capacity usage and networking performance; the success
of every trade depends on both factors working in tandem. The
continuing huge growth in market data and order volumes is
demanding huge investment in sell-side firms' data centres. These
data centres are rapidly being filled with ever-increasing
numbers of servers to drive the consolidation of market data and
handle the increasing demands of algorithm processing and the
growing complexity of risk modelling. Low latency is critical in
many areas and computational power is key in risk calculations,
so good data-centre design is critical. The application, compute
and network components must act cohesively for best
price/performance. Network bandwidth is increasing within the
data centres with Infiniband and 10 gigabit Ethernet now becoming
common as firms demand the highest performance from their
systems. External links to execution venues become key here,
where the proximity of a firm's trading and execution platform
can dictate who wins and who loses in the 'need for speed'. With
this in mind, the design of the data centre, physical distance
and type of network connectivity to exchanges -as well as the
option of co-locating some of the firm's systems at the exchange
or in a service provider's facility - is all part of the mix.
Woodward: The issues divide between application code,
networking infrastructure and hardware platform. These
combinations will differ depending on the functions being
performed, e.g. between low-latency trading and high-volume
settlement. Often the code is old, and/or is running on a heavy
operating system that is not multi-threaded and thus unable to
take advantage of multi-processors. Double digit gains in
performance and latency are now commonplace from enhancements in
the infrastructure, e.g. I/O acceleration, Infiniband,
multi-threading, tuning of operating systems, etc. It is all
about ROI from effort expended. Trading performance is paramount
and any gains are competitive advantage and so can be justified;
elsewhere, a judgement call has to be made between cost -
capital, resources and disruption - and the return.
What infrastructure changes must sell-side firms undertake to
optimise powerful new servers, i.e. with 10-gigabit Ethernet
cards?
Berkhout: We're seeing 10-gigabit Ethernet
services deployed more widely and price pressure on high-volume
10-gigabit Ethernet services. One of the drivers is that hardware
prices for interface cards and switches have come down
significantly. There is also an increased demand for fibre
channels, with two- and four-gigabit fibre channels gaining
greater market share.
Cooper: Failure to consider the whole end-to-end
topology and infrastructure so will lead to sub-optimal latency
and latency variation. Consideration should certainly be given to
increasing throughput and the inherent advantages of higher speed
interfaces like 10-gigabit Ethernet, but new technologies like
Infiniband should also be considered in the data centre
environment. These technologies do offer performance advantages,
but full optimisation has implications for application and
systems design and development.
Similarly the adoption of very high-speed interfaces and
platforms that utilise them has a knock-on effect on the
underlying network infrastructure, e.g. switching models and
capability along with external data centre communications. The
effect of high volumes of data being presented to sub-optimal
switching and routing infrastructures needs to be assessed.
Graham: If the software can exploit parallel
techniques, consideration must be made of Infiniband, for
example, writing software that can directly exploit lower latency
protocols. As the ability to accurately measure latency across
the architecture becomes more important, consideration of
time-syncing technology should also be made. Use of Stratum-x NTP
clocks to ensure enterprise-wide time-stamping should be
considered.
Through virtualisation of I/O devices, networking and storage can
be abstracted to a degree with consideration of any latency
implications. Optimisation of packet sizes should also be done to
reflect the workload being considered. Finally, as servers become
more powerful, consideration of the data storage and distribution
must be maintained to avoid potential 'starvation' of the
processors and other issues that may result from an unbalanced
system.
McAllister: With the popularity of increased
bandwidth capacity (via 10-gigabit Ethernet interfaces) and
increased application processing capacity (via more and faster
multi-core processors), servers need to be fed by a
hardware-based content infrastructure at the messaging layer.
These applications demand much higher message rates and lower
consistent latency for market data and for order execution. Such
high processing capacity devices need to be served by an even
higher capacity messaging infrastructure - otherwise, processing
cycles are wasted either waiting for data or repetitively
filtering and transforming data. These issues work against the
intended purpose of investing in high-performance servers and
leave performance bottlenecks unchanged. Coupling a
hardware-based content infrastructure with more powerful
application servers uses the right tools for the right jobs and
eliminates waste in end-to-end latency and throughput limits.
Sangha: Many firms are now looking beyond the
traditional 10/100 and gigabit Ethernet connectivity that forms
the basis of most data centres. We've seen Infiniband being used
by firms where an extremely low-latency interconnect is required
between servers. Market data infrastructure and algo-trading
farms are typical applications of this alternative to Ethernet as
well as in the construction of high-performance computing
clusters or grids. Infiniband uses specific network interface
cards and drivers for the servers that wish to connect to this
high-speed, low-latency interconnect. In reality, this is not an
issue and many firms see the benefit of this technology on
existing servers and applications with little if any need for
major systems changes. Additionally, if a firm wishes to optimise
its trading applications by modifying some of the code,
Infiniband can offer even better performance as well as removing
up to 70 per cent of the load on the servers that can be caused
by more typical Ethernet technology. Ten-gigabit Ethernet can
offer some of the throughput advantages of Infiniband, but as of
today is less deterministic in providing the consistent low
levels of latency that automated trading demands.
For sell-side firms needing to overhaul/replace legacy systems,
how best to optimise existing infrastructure and build a more
scalable one?
Berkhout: For ultra-low latency requirements,
hosting in the same building is the preferred way forward.
Scalability is achieved by default as network costs are
insignificant for primary feeds within the building. Only a few
carriers are well positioned as they provide the combination of
both data centre space and network services.
Graham: Abstraction technologies that isolate
layers of the system facilitate virtualisation where appropriate.
Splitting compute nodes and persistent data nodes enables good
separation of concerns too.
Writing software to support 64-bit and safe multi-threading will
allow the applications to exploit future hardware. Stress testing
of existing systems to profile and identify bottlenecks and hot
spots in end-to-end processes will also become increasingly
important.
McAllister: The critical factor is to choose an
infrastructure that can offload processing from legacy systems,
thereby extending their life. Hardware infrastructure provides
processing headroom for the future along with an increasing
performance curve. For example, many trading floors today
distribute market data through Ethernet multicast which requires
each application to receive, inspect and accept or discard each
message. Typically, the ratio of discarded messages to accepted
messages is very high which means a heavy CPU load for
non-productive work. As message rates increase, more and more
time is spent filtering messages, meaning less and less time for
application processing. Hardware-based middleware solutions use
network processor and FPGA technology to perform message
filtering before messages reach the application. Hardware can
also ease transition between legacy and modern systems by
reformatting data as it is delivered - from the format of the old
application to the format of the new. These approaches allow
communication between old and new applications to be evolutionary
rather than disruptive, with no performance impact.
Is the server farm the most appropriate model for exchanges
looking to expand capacity?
Berkhout: Traditionally, server farms are
deployed in a scenario with a primary and secondary location.
These have typically been owned locations and we see a trend
towards leveraging external server farms to complement these.
This spreads risk and deals with fluctuations in capacity demand.
Graham: There are trade-offs to this approach.
Overall system architecture becomes more complex due to the need
to build high availability and recoverability into the solution.
Usually this means additional hardware in the form of primary and
secondary servers. Management of a large server farm introduces a
host of issues that ultimately threaten system stability and
reliability.
One alternative would be a two-tier system, comprised of
high-speed server components on the front-end, and a robust
database server to provide high availability. This approach
applies the appropriate server technology that best suits the
opposing needs of low latency and high reliability. The advantage
is the creation of an exchange solution with both high
performance and proven reliability. Another alternative is the
adoption of dynamic resource reallocation. In fast markets, the
ability for humans to respond to market spikes and allocate
additional capacity is diminished. Server technology exists today
that has the capability to dynamically reallocate resources.
McAllister: The best price performance comes down to the
message volume an infrastructure can sustain with the least
complexity, for the least amount of expense. Server farms that
run software are typically complex to manage, consume large
amounts of data centre space and struggle to scale to the
high-volume requirements of exchanges today. A hardware-based
infrastructure provides an integrated platform with very
predicable behaviour and eliminates the performance challenges
introduced by the interaction between software and operating
systems. This allows multiple orders of magnitude better
performance with ultra-low consistent latency, which is essential
to exchanges as the source of essential large-scale information
feeds.
Woodward: The key choices around use of server
farms key are between horizontal or vertical scaling and
deployment of proprietary or commodity industry-standard
technologies, especially as there are questions over the
reliability of the latter. The debate is swinging towards
horizontal scaling. This gives access to lower cost, industry
standard hardware, and operational risk is managed by designing
resilience into the infrastructure.
At the switches and routers level, what should exchanges be doing
to improve response times?
Berkhout: We recommend running low and ultra-low
latency applications as close to the optical layer as possible,
where commercially viable, and with the fewest protocol
conversions. In practice, deployments are a compromise between
ease of configuration, ease of management and support versus
dedicated connections and will vary per application and user.
McAllister: Use of higher capacity links such as
10-gigabit Ethernet and cut-through switches can be used to
reduce communication latency and thus distributed information
sharing latency. However, this latency is already literally in
the one microsecond range. An important infrastructure factor in
reducing order acknowledgement latency is the performance of the
persistent messaging systems used between the various stages of
order execution in some venues. Persistent messaging is typically
performed using rotating disks which are orders of magnitude
slower than RAM and processor speeds. Hardware-accelerated
persistent messaging resolves this challenge to ensure that
messages can never be lost, supported message rates are very high
and latency is consistently ultra low. This simple infrastructure
change dramatically improves response times, especially at peak
trading hours, because it provides the excess processing capacity
and stability only possible with hardware implementations.