The Gateway to Algorithmic and Automated Trading

Utopian Quantopian?

Published in Automated Trader Magazine Issue 30 Q3 2013

Quantitative strategy building and backtesting platform? Using an accessible, popular yet powerful programming language? Paper trading, plus (in due course) live deployment of strategies? Free!?! Automated Trader's founder, Andy Webb, and the Wrecking Crew check out whether Quantopian really is all it seems.

The Wrecking Crew have in their various roles been playing with financial software for rather a lot of man years. But curiously none of them has ever evaluated a backtesting and trading platform that uses Python for coding trading models. Numerous proprietary scripting/macro languages, C#, C/C++, VB, VBA, Pascal (one of our older members), Cobol (a liar/fantasist) - yes. Python - no.

That alone was a pretty good initial excuse for taking a look at Quantopian, but the fact that it is cloud-based was an additional attraction. Cloud computing is one of those areas where the Wrecking Crew's various little intellectual differences turn decidedly sectarian. Opinions are (to put it mildly) somewhat polarised, so for the very few of us with no particular axe to grind, the prospect of witnessing a major dust up was undeniably appealing.

What's included?

The basic premise of Quantopian is that users can (at no charge) write, back test and ultimately trade their models via a web browser. The open source platform that Quantopian has built to enable all this is called Zipline and is written in Python, as well as using Python for the coding of trading models. This simplifies the platform's access to other permitted Python modules, of which there are currently 21 (other than Zipline itself).

Access to these standard Python modules adds a considerable depth of functionality to Quantopian across a broad spectrum.

Among others, this includes:

• High performance looping (itertools)

• Array processing (numpy)

• Data structures (pandas)

• Sorting (heapq)

• Maths, statistical, scientific and engineering (math, cmath, statsmodels, scipy)

• Time and time zones (datetime, pytz)

• Machine learning (sklearn)

For the stat arb fraternity, the only significant omission we spotted was an econometrics module that included tests for properties such as cointegration. While the Augmented Dickey-Fuller test is available in the statsmodels module, it only appears to use an implementation with critical value tables suitable for observed as opposed to estimated time series (see http://www.automatedtrader.net/articles/software-review/142233/cointegration-assume-nothing--check-everything for more on this). Someone has made a start on coding the Johansen framework over on GitHub following the implementation of LeSage's Spatial Econometrics toolbox for MATLAB and this is intended for inclusion in a future release statsmodels. However, the last update to the code appears to have been ten months ago, so to avoid further delay Quantopian are considering incorporating the necessary code directly rather than waiting for it to appear in statsmodels.

Getting started

Although Python is more demanding to use than some of simpler proprietary scripting languages out there, it's unlikely to present much of a hurdle to most traders with a bit of determination. Quantopian does a nice job of streamlining this further, so it's possible to implement a simple long/short momentum model (which is the basic sample trading algorithm provided on the platform) with a grand total of two functions and twelve brief lines of code.

Trading algorithms in Quantopian only require the use of two methods:

initialize - creates an empty and initialized Python dictionary called context. The dictionary has been enhanced so dot notation can be used to access properties as well as the more commonplace bracket notation. initialize is used to define such things as the securities that will be used in the back test and position sizes. So, for example...

initialize - creates an empty and initialized Python dictionary called context. The dictionary has been enhanced so dot notation can be used to access properties as well as the more commonplace bracket notation. initialize is used to define such things as the securities that will be used in the back test and position sizes. So, for example...

...uses Quantopian's sid method to create a context object with the attribute ibm.

handle_data - this is called whenever a market event relating to any securities referenced by the trading model takes place. It references the same context object created by initialize as well as a data dictionary of the securities defined for use in the algorithm. So, for example...

handle_data - this is called whenever a market event relating to any securities referenced by the trading model takes place. It references the same context object created by initialize as well as a data dictionary of the securities defined for use in the algorithm. So, for example...

...accesses the standard deviation property of the ibm attribute created above and puts it in the variable standev. A nice touch is that Quantopian goes beyond the conventional open, high, low, close, volume etc here by also making transforms, such as VWAP, standard deviation, returns and moving average available as built in properties, rather than having to call additional functions to calculate them.

All code editing is done in the browser, which for those attached to their IDEs is actually a lot less painful than it sounds. There's an auto complete feature, which not only accesses properties/methods as in a conventional IDE, but also taps a symbol database, so after you type "sid(" and then start typing the name of a security, a drop down list appears. Select the desired security name from this and Quantopian automatically converts it to the correct symbol number in its database. There's also a handy tooltip function, which in addition to its usual help functionality also allows you to mouse over a symbol number to see the name of the security to which it relates.

Though we eventually found a workaround, one thing that did puzzle the Wrecking Crew was that we couldn't find any search and replace capability in the Quantopian code editor. For shorter segments of code, this isn't much of an issue, but for more complex models it is a real pain to have to use the browser's search function and then manually change every instance of a variable's name or line of code. In fact even this doesn't work reliably: we found that in both Chrome and Firefox the browser's search function would sometimes fail to find key words that were actually present. For example, the text "date" was only found when it occurred as part of a comment, but not in executable code.

At first we thought we could get around this by using one of the Search and Replace plug ins available for Chrome and Firefox, but while this found all instances of a string and replaced them, it created a range of other bizarre problems. In Chrome, as soon we did the search and replace, the editor locked up completely. Firefox was even more bizarre, it changed the code visually - but not from the perspective of the syntax checker. For example, we changed the assignment and name of the Apple symbol variable in the sample algorithm - context.aapl - to context.ge but the debugger still threw a fit over the absence of context.aapl even though it no longer appeared in the code.

Ultimately we found that the best solution was to copy and paste all the code out to a plain text editor, make the desired search and replace changes and then copy and paste the whole lot back into Quantopian's editor. We tried this with a couple of Python developers' editors as well, which means you also get access to features such as syntax highlighting and auto-indenting as well (should you want to do more extensive editing outside Quantopian). However, this approach obviously sacrifices Quantopian's handy auto complete feature.

Debugging and backtesting

The review team liked the general layout of the Quantopian development environment, which has the code on the left hand side of the browser window, with the quick back test above the debugging info and logs on the right hand side (see Figure 1).

Figure 1

Figure 1

Once your code is complete, clicking the Build Algorithm button at the top left of the window (or pressing Ctrl-B) conducts a number of syntax and error checks and then flags any errors found in the Build Errors tab. In this tab in Figure 1 you can see the consequences of a number of deliberate errors we introduced to the Quantopian sample script.

If the build process does not trigger any errors, then a quick back test on daily data is also run automatically at the same time. If you don't wish to use the default account size and test period settings for this, these need to be tweaked first at the top of the quick back test window.

Figure 2 shows the results of a quick back test run on General Electric. As mentioned earlier, quick back tests run on daily data, but for a more granular evaluation a full back test runs on one minute data.

Figure 2

Figure 2

Doing a full back test for a year's worth of General Electric data took one minute and 51 seconds (see Figure 3). While some might regard this as slow, it's actually highly respectable because of the way in which Quantopian conducts its back tests. The vital distinction is that unlike many back testing platforms it actually does a full bar by bar data replay, as opposed to calculating all values (in this case VWAP based momentum thresholds) in one step and then comparing those with prices for trade triggers. While the Quantopian way of doing things might appear laborious, it has the enormous advantage of avoiding peek ahead issues. These can easily arise when using conventional back testing approaches that pre-calculate all values. More than one member of the Wrecking Crew has achieved temporary fantasy millionairedom in this way by developing algorithms that have entries on the open of a bar that are triggered by a value based on the closing price of the same bar.

Figure 3

Figure 3

As Figure 3 shows, a full back test also provides a lot more in terms of performance metrics, as well as far more granular transaction data. In addition to providing summary metrics for the whole back test period, individual breakdowns by period are also available (see Figure 4).

Figure 4

Figure 4

In order to obtain some visual insight into key variables in the trading algorithm, it's also possible to plot them as part of the back test display using the record method. For instance, adding the following code snippet to the handle_data function in the Quantopian sample momentum strategy displays the closing price of General Electric together with the strategy's VWAP-based short/long entry levels.

In order to obtain some visual insight into key variables in the trading algorithm, it's also possible to plot them as part of the back test display using the record method. For instance, adding the following code snippet to the handle_data function in the Quantopian sample momentum strategy displays the closing price of General Electric together with the strategy's VWAP-based short/long entry levels.

Time frames and data windows

Quantopian doesn't always follow the conventions used by many other backtesting platforms, with two obvious examples being time frames and data windows. In neither case is this in any way a problem, it's just a matter of being aware that Quantopian does some things rather differently.

One thing that does take some getting used to with Quantopian is that daily and one minute bars are currently the only available default time frame options. If you're looking for a handy drop down list from which to pick 10 or 30 minute bars with which to back test then you're out of luck. Though this functionality may be added in the future, it isn't a major issue in the meantime as it's possible to accomplish the same result by adding a few lines of code.

First, import (reference) the pytz world time zone module and then use this to define your desired time zone:

First, import (reference) the pytz world time zone module and then use this to define your desired time zone:

Then use this new time zone variable as an input to Quantopian's built in get_datetime method:

Then use this new time zone variable as an input to Quantopian's built in get_datetime method:

The output from that can then be used in a conditional statement...

The output from that can then be used in a conditional statement...

...that checks whether the remainder of dividing the current time in minutes by five is zero, and if it is, to trigger a function that captures the desired price (or other value) for the period...

...that checks whether the remainder of dividing the current time in minutes by five is zero, and if it is, to trigger a function that captures the desired price (or other value) for the period...

...which can then be used within a trigger for trade entries/exits etc.

A similar individualistic situation applies to look back periods in Quantopian. If you're looking for a drop down menu that lets you choose a look back period from a list to use with a technical indicator or other function, then again you're out of luck. But again, there's a simple alternative way of doing this in Quantopian with the assistance of a helper facility called batch_transform. This captures a trailing data window of a length that you specify. For example, adding the following...

A similar individualistic situation applies to look back periods in Quantopian. If you're looking for a drop down menu that lets you choose a look back period from a list to use with a technical indicator or other function, then again you're out of luck. But again, there's a simple alternative way of doing this in Quantopian with the assistance of a helper facility called batch_transform. This captures a trailing data window of a length that you specify. For example, adding the following...

...to the initialize function at the start of your algorithm enables you to automatically convert default single row data output to a multi row data panel. So a function that collects individual closing prices...

...to the initialize function at the start of your algorithm enables you to automatically convert default single row data output to a multi row data panel. So a function that collects individual closing prices...

...can be called by another function...

...can be called by another function...

...that because it invokes the context function also implicitly calls the batch_transform facility, and converts this to a data panel with five rows (as specified by "window_length=5" in the first code clip above). The result of sending the output of this to Quantopian's log window can be seen in Figure 5, which shows the four most recent trailing data windows for the three stocks for which closing data was requested (code for this not shown).

Real world

A significant issue with back testing is how easy it is to fool yourself into thinking that you've developed something robust when you haven't. Often this occurs when you inadvertently over fit or omit factors that are likely to occur in live trading or through over fitting. Quantopian has some useful functionality to help you avoid these situations.

It is all too easy to inadvertently over fit a model, such as by applying and tuning a reversal algorithm to a handful of stocks that happens to be trading sideways. Then when you apply the same model in live trading to a different (or perhaps even the same) small set of stocks that take off on a strong trend - oops...

Quantopian helps you sidestep this issue with a handy tool for selecting a testing universe of stocks based upon dollar volume (trading volume of a stock multiplied by its price). The desired universe is defined in the initialize function by specifying from which percentiles in the DollarVolumeUniverse global universe to draw the stocks.

Figure 5

Figure 5

For instance, this...

For instance, this...

...defines a universe of just the top 0.1% of the global universe by dollar volume. The dollar volume values and rankings of the global universe are updated quarterly. When this happens, the platform invokes a number of rules intended to maintain the integrity of any back tests active at that point in time. For example, if a stock would normally have fallen out of a user-defined universe due to a change in its dollar volume, it will actually be kept in the universe if it is currently in an open trade.

In addition to helping avoid the issue of biased selection, this functionality has a number of other possible uses, one of which links to another of Quantopian's real world features: the handling of slippage. The platform offers two standard models for slippage: fixed and volume share. The former is used to reflect anticipated bid/offer spreads and Quantopian adds/deducts half of whatever you specify to any purchase/sale. The latter lets you specify the maximum percentage of volume per bar your order can represent, together with a price impact constant. This constant is then multiplied by the square of the maximum permitted percentage of volume to arrive at a percentage slippage value.

However, Quantopian also allows the creation of custom slippage models, which could include inputs such as the stock's ranking in the dollar volume universe to develop a universal and flexible liquidity-centric slippage model.

Data

Something that will attract many traders and quants is that Quantopian looks after the data (well a good chunk of it anyway). Its historical database contains one minute and daily data for 15,000 US equity market instruments ranging across high/mid/low caps, ETFs and ADRs over the past 11 years that are updated daily. The data are automatically adjusted for mergers and splits, but (in another nice real life touch) dividends are treated as events and automatically taken into account when running back tests (though they aren't as yet available via the API). For instance, if you have a short position open on the close prior to an ex dividend date, then this is taken as a deduction to cover the real world obligation of paying dividends to the lender of the stock.

The fact that Quantopian provides data management for an already decent universe of instruments is obviously a major time saver. While this universe is likely to expand over time, inevitably it can't cover everything. Apart from certain market data it also doesn't cover items such as economic data or analysts' forecasts. However, it does provide a way of loading such data in CSV format from external sources via HTTP using the fetch_csv method (generally referred to as Fetcher). This currently supports two types of time series: those that are specific to a security (such as corporate news events) and those that are not (such as national inflation data).

If you aren't creating the CSV source file yourself, Fetcher enables the modification of external data after it has been loaded from a third party source. This can either be done immediately or after the data has been sorted by date.

Optimisation, paper/live trading and progression

At present Quantopian doesn't provide parameter optimisation built into its IDE. It is definitely intended for the future, but has obvious implications for a hosted application (umpteen people all attempting a huge parameter sweep in an exhaustive optimisation doesn't do wonders for overall performance).

Figure 8

Figure 8

However, because Zipline is open source, it is possible to do this in the meantime using the same back testing platform. While we didn't test this, instructions for how to do this via clustering in the cloud are provided on the Quantopian blog .

While our review process focused on Quantopian's back testing facilities, it's worth mentioning that it has recently launched paper trading capabilities (which we didn't test) for one algorithm per user (a limit that it will be increased in the future). Beta testing of live trading (which will ultimately be a paid for service) using Interactive Brokers is also currently underway with a group of existing users. This beta allows both live trading with real money as well as trading within Interactive Brokers' own paper trading accounts.

From reading the various blog and forum posts, it becomes clear that Quantopian is evolving swiftly. It's also interesting to observe how much new functionality is being driven by requests from users and that while this inevitably results in a to do list from hell they do seem to plug through it. They're also refreshingly upfront when the implementation of new functionality doesn't quite go to plan (such as the planned incorporation of the Python version of the TA-Lib technical analysis library).

The elephant in the room

The one thing that some traders/quants will always see as a negative about Quantopian is also its greatest positive: the fact that it's hosted. Quantopian is obviously well aware of this and goes to considerable lengths to try and reassure those concerned about leakage of intellectual property. Nevertheless, some will undoubtedly still see this as too much of a risk and therefore refuse to use the platform.

However, there are a number of ways around this issue for those concerned about the security/confidentiality of their trading models. One, already mentioned, is to download and run Zipline yourself, but this obviously loses the benefit of having a managed data service. The other is to abstract all parts of an algorithm regarded as particularly sensitive, run them offline, and then upload their output via Fetcher.

Nevertheless, that still leaves the matter of ensuring that the data used for the offline processing exactly matches the data used by Quantopian online (note to Quantopian: data sales opportunity here).

The odd niggle

As we kicked, tweaked and prodded Quantopian we inevitably came across a few "If only" and "Why did they?" moments, as well as the odd incidence of unresponsiveness from the platform. The last point seemed to be more of an issue in Chrome and at its most extreme resulted in Figure 6. On other occasions it happened when clicking to save cloned algorithms (you can clone any published algorithms the creator has chosen to share on Quantopian), when repeated clicks on a dialog button did nothing for ages and then suddenly created 18 clones.

Conclusion

BUT this sort of glitch needs to be seen in the context of two extremely important points:

• Quantopian is free...

• ...and is also continuously improving and expanding

Without naming names, the Wrecking Crew mentioned several commercial backtesting platforms costing thousands that they regarded as significantly inferior to Quantopian as it stands today. Given its rate of progress, it seems likely that the number of these surpassed competitors will continue to grow.

All told, we couldn't help being severely impressed with Quantopian. Yes there's the odd gap (optimisation) and eccentricity (Figure 8) but it offers an extremely potent back testing platform that uses a powerful yet relatively unintimidating programming language. It also includes a significant amount of functionality that brings the process of back testing closer to the reality of live trading, which is no small achievement. And by the way, did we mention that it's free?