# Statistical Arbitrage: Algorithmic Trading Insights and Techniques Chapter 2 Statistical Arbitrage

## Statistical Arbitrage: Algorithmic Trading Insights and Techniques Chapter 2 Statistical Arbitrage

Much of what happens can conveniently be thought of as
random variation, but sometimes hidden within the
variation are important signals that could warn us of
problems or alert us to opportunities.

-Box on Quality and Discovery, G.E.P. Box

## 2.1 INTRODUCTION

The pair trading scheme was elaborated in several directions beginning with research pursued in Tartaglia's group. As the analysis techniques used became more sophisticated and the models deployed more technical, so the sobriquet by which the discipline became known was elaborated. The term ''statistical arbitrage'' was first used in the early 1990s.

Statistical arbitrage approaches range from the vanilla pairs trading scheme of old to sophisticated, dynamic, nonlinear models employing techniques including neural networks,wavelets, fractals- just about any pattern matching technology from statistics, physics, and mathematics has been tried, tested, and in a lot of cases, abandoned.

Later developments combined trading experience, further empirical observation, experimental analysis, and theoretical insight from engineering and physics (fields as diverse as high energy particle physics to fluid dynamics and employing mathematical techniques from probability theory to differential and difference equations). With so much intellectual energy active in research, the label "pairs trading" seemed inadequate. Too mundane. Dowdy, even. ''Statistical arbitrage'' was invented, curiously, despite the lack of statisticians or statistical content of much of the work.

## 2.2 NOISE MODELS

The first rules divined for trading pairs were plain mathematical expressions of the description of the visual appearance of the spread. For a spread like the CAL-AMR spread in Figure 2.1, which ranges from −\$2 to \$6, a simple, effective rule is to enter the spread bet when the spread is \$4 and unwind the bet when it is \$0.

We deliberately use the term rules rather than model because there is no attempt at elaboration of a process to explain the observed behavior, but simply a description of salient patterns. That is not to diminish the validity of the rules but to characterize the early work accurately. As the record shows, the rules were fantastically profitable for several years.

Applying the \$4-\$0 rule to the CAL-AMR spread, there is a single trade in the calendar years 2002 and 2003. If this looks like money for practically no effort, that is the astonishing situation Tartaglia discovered in 1985-writ large across thousands of stock pairs.

Alternatives, elaborations, and generalizations jump off the page as one looks at the spread and considers that first, seductively simple rule. Two such elaborations are:

• Make the reverse bet, too.
• Make repeated bets at staged entry points.

## 2.2.1 Reverse Bets

Why sit out the second half of 2002 while the spread is increasing from its narrow point toward the identified entry point of \$4? Why not bet on that movement? In a variant of the commodity traders' "turtle trade," rule 1 was quickly replaced with rule 2, which replaced the exit condition, "unwind the bet when the spread is \$0," with a reversal, "reverse the long and short positions." Now a position was always held, waiting on the spread to increase from a low value or to decline from a high value.

## 2.2.2 Multiple Bets

In the first quarter of 2002 the CAL-AMR spread varies over a \$6 range from a high of \$7 to a low of \$1. Bets placed according to rule 1 (and rule 2) experience substantial mark to market gains and losses but do not capture any of that commotion. Since the spread increases and decreases over days and weeks, meandering around the trend that eventually leads to shrinkage to zero and bet exit (rule 1) or reversal (rule 2), why not try to capture some of that movement?

This single illustration demonstrates in blinding clarity the massive opportunity that lay before Tartaglia's group in 1985, an era when spreads routinely varied over an even wider range than exhibited in the examples in this chapter.

## 2.2.3 Rule Calibration

Immediately when one extends the analysis beyond a single pair, or examines a longer history of a single pair, the problem of calibration is encountered. In Figure 2.2 another pair of price histories is shown, now for the single year 2000. Figure 2.3 shows the corresponding spread.1

Wow! We should have shown that example earlier. The spread varies over a \$20 range, three times the opportunity of the CAL-AMR example examined in Figure 2.1. But right there in that rich opportunity lies the first difficulty for Rules 1-3: The previously derived calibration is useless here. Applying it would create two trades for Rule 3, entering when the spread exceeded \$4 and \$6 in January. Significant stress would quickly ensue as the spread increased to over \$20 by July. Losses would still be on the books at the end of the year. Clearly we will have to determine a different calibration for any of Rules 1-3. Equally clearly, the basic form of the rules will work just fine.

computer. For the second example (Figure 2.2) the spread range is \$3 to \$22. The 20 percent margin calibration gives trade entry and exit values of \$18 and \$7 respectively. Applying Rule 1 with this automatic calibration yields a profitable trade in 2000. That desirable outcome stands in stark contrast to the silly application of the example one calibration (entry at \$4 and \$6 and unwind at \$0 as eyeballed from Figure 2.1) to the spread in Figure 2.2 which leads to nauseating mark to market loss.