First, an admission; the sheer scale of MATLAB and the space limitations of print mean that this review can only examine a small subset of the product's functionality. In response to reader requests, the focus here will be on MATLAB's suitability as a platform for automated trading model development and how quickly one is likely to become productive when using it for this purpose.
Ease versus power
One of the biggest challenges when devising any programming environment is striking the right balance between ease of use and power. Development environments for trading models are no exception; on the one hand users needs to be productive as soon as possible, but do not want to subsequently find their creativity constrained by limitations imposed by the initial need for simplicity.
Historically, many such development environments have evolved as
extensions of charting and technical analysis packages. One
common shortcoming of these products is that they are often
linear in their handling of time series. They may be perfectly
competent at a single instrument level or perhaps at calculating
the correlation between two securities, but anyone
intending to construct synthetic pairs from a matrix of thousands of instruments is going to be disappointed.
That isn't in any way a criticism of such products; they don't pretend to more than they do. However, from our conversations with readers it has become apparent that traders who previously used only a relatively small number of models to support their manual decision making, are planning to diversify into automated trading. In doing so they are looking to expand their repertoire; rather than just using a fairly narrow subset of tools usually based on technical analysis, they are keen to explore a wider range of statistical/econometric techniques. As a result, we have a sharp increase in the number of reader queries about products such as MATLAB, S+ and their open source counterparts such as Octave and R.
So where does MATLAB fit on this ease/power spectrum? Initially, it's easy to succumb to a panic attack due to the sheer scale of the software. In addition to the core MATLAB program, nearly a hundred toolboxes and other specialist add-ons are also available offering literally thousands of functions. To a new user accustomed to some of the charting-derived environments mentioned in the previous section, just getting a grasp of the basic MATLAB command line syntax might seem daunting.
Happily, this isn't the case for the very simple reason that the MATLAB help process (it's much more than just help files) is outstandingly good. Off and on I've used Octave, which is an open source equivalent to MATLAB, and the documentation and help for that is perfectly respectable, though even with more complex functions there seems to be an underlying assumption that you already understand all the nuts and bolts. To be fair, Octave is open source not commercial, so the contributors to the project have limited time and resources.
Having said all that, the MATLAB documentation and help files are
exceptional. Contextual help and useful code fragments abound,
but they are only part of the picture. Every MATLAB toolbox comes
with a selection of sample projects included in the help
files. Also included are links to webinars by MATLAB staff that explore the various facets of the software and provide sample files, all heavily commented and often including suggestions and tweaks for possible alternative deployment. In addition, The MathWorks runs a series of free one day MATLAB financial seminars globally in various languages. Finally, there is MATLAB Central, the online community for MATLAB users with hundreds of finance-related scripts and functions available for free download.
Apart from the depth of information available from these sources, an additional bonus is their general ethos. In many cases they do not assume that you are already intimately acquainted with the function or concept you are looking up. For example, the help files for the Wavelets Toolbox include a lucid introduction to wavelets - not just in terms of what they are, but also as regards their possible application.
Interestingly, quite a few contributors to MATLAB Central appear to think the same way. As just one example, I was surprised to see that someone had taken the trouble to write and upload a well-documented introductory function file for the Kalman filter. OK, to seasoned quants this is all just one big yawn, but to traders looking to up their game by diversifying into new areas of research using a new tool, this mindset is a significant help in getting up to speed and actually becoming productive.
The fundamental starting point in MATLAB is the command line; a similar concept to the DOS prompt or Linux shell. While there is nothing to stop you issuing individual commands from here for everything, in practice most users will write their most common command series in script files and run these from the command line instead. In addition to invoking scripts and functions, the command line can also be used to perform everything from simple arithmetic to large scale matrix operations.
The default MATLAB workspace (see Figure 1) has the command line
at its centre. To its left is the current directory listing, to
its right are the variable window and the command line window
history. The variable window is similar in concept to the Locals
window in Excel's VBA editor and allows you to see
all the variables and their values for your current session. (In Fig 1, I have declared a variable called 'variable' at the command line and assigned it a value of one, as shown in the variable window.)
The command line window history does what it says on the tin, by
allowing you to double click a former
command to (re)execute it immediately. It also allows you to invoke a couple of handy utilities from a right-click menu - the M-File builder and the Profiler. The M-File builder takes a command or series of commands and automatically builds them into a MATLAB script. So if you have been debugging a set of commands individually and wish to compile them into a single script it's just a case of Ctrl- or Shift-selecting the commands in the command line window history pane and clicking Create M-File. The M-File builder is also accessible via a check box in various other MATLAB windows, such as the Import Wizard (see 'Acquiring data' below).
The Profiler is used to improve the performance of your MATLAB code (M-Code) by timing the execution of its various elements, to assist in identifying infelicities such as unnecessary function calls. The Profiler works in conjunction with a utility called M-Lint, which in addition to highlighting outright errors will also identify code that is inefficient, such as unused input arguments or variables. If you are using parallel processing to speed things up, there is also a Parallel Profiler that shows how much time each processing session takes in evaluating each function and also how much time it spends communicating or waiting for communications with other sessions.
Though the command line sits at the core of MATLAB, the program also includes plenty of GUI driven elements, such as the financial charting and optimal capital allocation tools shown in Figures 2 and 3. Underlying all these GUIs is an M file; for example, a segment of the file for the optimal capital allocation tool is shown in Figure 4.
The opportunity here is that it is relatively trivial to build your own GUIs to front your own M-Files using the GUIDE tool shown in Figure 5a. As you add components to your GUI and save it, the associated M-File is automatically updated. For example, in Figure 5b the segment of code added in response to the 'TEST' command button in Figure 5a is highlighted.
Figure 6 shows a simple GUI built for the purposes of this review; it invokes several functions that in turn initiate a series of calls to CQG's API to retrieve either time or constant volume bars for a security. Though not shown in the GUI, it automatically saves the data retrieved to a text file and names it based on a combination of the current date/time and data entered into the GUI. Alternatively it would be fairly simple to expand the GUI to give the user a choice of outputs, such as assigning the data to a variable or writing it to a larger database.
At the coal face - MATLAB users' opinions
Miles Kumaresan, Managing Director and Head of Quantitative Trading at TransMarket Group Ltd
The group I head at TransMarket makes extensive use of MATLAB in designing and testing trading models. Personally, I've been using it for about eight years and found getting to grips with it straightforward. You don't need huge programming expertise to become productive with MATLAB; if you are competent in VBA then that will take you a long way with MATLAB.
Though it isn't unique to MATLAB, its use of vector notation is particularly useful given our focus on time series data. You can apply a transformation to an entire time series in a single operation, without having to resort to loops. As a result, you can achieve a considerable amount in MATLAB with a single line of code.
Memory management is robust in that MATLAB can easily handle large data sets without instability. Given that we may often be working with tick data sets running into hundreds of thousands of rows and very large sparse matrices, that facility is obviously important.
However, while MATLAB's memory management is good, we find it slow when reading or writing data to disk. Data in plain text ASCII format that might only take five seconds to analyse once in memory can easily take twenty seconds to load. Java or C++ would read that data from disk in a fraction of the time.
Unless you are operating at the absolute bleeding edge of high
frequency trading, MATLAB is fast enough for most purposes. We
compile some of our MATLAB files as C++ DLLs and we find that
we have managed to get the latency of calling those
libraries below one millisecond. While that would still be too slow for the very highest frequency trading, it is perfectly adequate for most trading activity.
David Knox, CEO, I-TRADERS
We've been using MATLAB for about six months and so far we've been impressed. You can broadly split our usage into two areas, general R & D of trading strategies/indicators and real time publication of those to clients on the Web. The latter isn't fully live as yet, but we've been testing for a while and like the way MATLAB makes it easy to distribute data and models to those who don't actually have the application themselves. When compared with the alternative of testing something and then having to recode it from scratch in another language, MATLAB's ability to quickly and automatically compile M-Files into a distributable format is a real time saver.
While our main MATLAB user here had used it before joining us, other personnel have found it easy to pick up. As a result, we've been pleased with the productivity possible; for us it seems to strike exactly the right balance between usability and power.
The large number of MATLAB users worldwide is a major plus for us. We frequently find that the basic functions we need have already been written by somebody else and we can build on those as necessary when developing our proprietary algorithms. We've tried out a few of the open source toolboxes that are MATLAB-compatible and found they work very well, but without all the support/documentation you get with the MATLAB toolboxes it does admittedly take a bit longer to get up to speed.
One of most fundamental tasks when building, testing and deploying financial models is obviously data acquisition. The MATLAB Datafeed Toolbox provides support for a number of popular data vendors, including Bloomberg, Reuters MDS and Thomson Datastream. Other data vendors and execution platforms have taken various alternative approaches to connecting to MATLAB. Those such as RTS Realtime Systems and 4th Story are both members of MATLAB's third party vendor program (see sidebar 'MATLAB Third-Party Products & Services').
CQG has written and made available a selection of M-Files that interface directly with the CQG API and allow both historical and real time data to be retrieved into MATLAB. In addition, a further set of M-Files make it possible for trading models running in MATLAB to route orders back out through the CQG API to various partner brokers/clearers.
If you want to import your own static data in a variety of common formats, MATLAB makes it pretty straightforward. Figure 7a shows the first step in its Import Wizard when importing a file of data in Excel format (note the 'Generate M-Code' check box at the bottom). As default, the wizard creates variables for the data on the basis of a preview (highlighted by the blue rectangle), which detected the column headers and numeric data separately. The disadvantage with this is that you end up with the data in a rather amorphous mass, which doesn't exactly help with manipulation. Figure 7b shows the solution, which is to choose the other radio button. This creates separate vectors (more on MATLAB's handling of vectors and matrices follows below) for each column and names the assigned variable after the column heading; as a result each individual time series is readily accessible.
Another advantage of separate vectors is that the min/max values for each series are visible in the variables window. Figure 8 shows an abortive attempt at a candlestick chart, with the reason for this error highlighted in red above; namely, that the data is corrupt and contains large negative numbers.
Incidentally, while MATLAB's Import Wizard is very effective, there are a few things to be aware of.
Those accustomed to using conventional date formats in historic data files will need to remember to convert them to serial dates first or the Import Wizard will not recognise them as a separate vector. I also found that when importing data from Excel files MATLAB spawned multiple Excel processes in the background (see the segment from the Windows Task Manager in Figure 9). The snag was that these processes did not terminate when the wizard completed, so when importing a succession of Excel files I began encountering out of memory errors and discovered I had some thirty Excel processes running that had locked up nearly two gigs of memory.
An alternative way of importing data from Excel files is to use
the xlsread function at the MATLAB command line, which worked
This only kicked off one background Excel process, but again this process did not terminate once the import completed and had to be killed manually.
Neither of these problems is a MATLAB bug; I discovered that it is in fact caused by certain other applications that may be running on the same machine (Google Desktop Search is a common culprit). When MATLAB spawns the first Excel process, these other applications immediately attach themselves to it and won't release it. As long as one is aware that this can happen, it's not a show stopper because the superfluous processes can be terminated manually. However, given the ubiquity of data imports from Excel files, it is something worthwhile remembering.
Something long appreciated by MATLAB users across all industry segments has been its dexterity when manipulating vectors and matrices. Its ability to conduct complex transformations on even large data sets means that exploring the most intricate and large scale inter-market relationships is possible.
Even though MATLAB programs are interpreted not compiled, its facility with vectors is fundamental to its performance. MATLAB vectors and matrices can be manipulated most effectively when they are stored in columns in contiguous blocks of RAM. As a result any code you write that takes advantage of this behaviour will be more efficient - in many cases, by a large margin. This is particularly important for those migrating from other programs that are not tuned for vector/matrix performance and who might therefore be naturally inclined to write code that uses loops.
Loops will of course work in MATLAB, but should be avoided if a "vectorised" way of achieving the same result is available. For example, it is possible to loop through a dataset to perform an operation on the data points individually and create a new vector variable to hold the transformed data. However, at each iteration MATLAB will be adding individual elements to a vector that it will be increasing in size on the fly. This incurs an extra overhead in the form of unnecessary memory allocation calls, but in addition the individual transformed data points will not be stored contiguously in memory. As a result, even though the end result is a vector, any subsequent operations performed on it will be inefficient, as they will not be reading the data from a contiguous memory block. The alternative (and far more efficient) way to achieve the same result is wherever possible to size the output vector variable before writing to it.
MATLAB has many specific functions intended to make the use of looping redundant and thereby increase code efficiency. A classic example is re-dimensioning matrices. Assume you have a 10,000 row by 100 column matrix and you wish (for whatever reason) to re-dimension it to 1000 by 1000. You can write a loop that will (slowly) accomplish this, or you can use the purpose built MATLAB reshape function, which will do it nearly instantaneously. As your data sets increase in size, the benefits of thinking and programming in MATLAB vector terms obviously increase.
On that note, one of the most important factors that determine the maximum data set size that MATLAB can handle is the process limit (the maximum virtual memory a process or application can address). This is a function of the operating system and varies considerably. A standard 32-bit XP or Vista set up running 32-bit MATLAB is limited to 2GB. At the other end of the spectrum, 64-bit MATLAB running on a 64-bit OS will stretch to 8TB - which will hopefully cover most eventualities.
MATLAB Third-Party Products & Services
MATLAB's Third-Party Products & Services program includes a number of specialist providers. Here, two such providers talk about their reasons for coupling their products with MATLAB.
John Melonakos, CEO, AccelerEyes
Our product, AccelerEyes Jacket, solves a key problem my three co-founders and I faced as PhD candidates processing large datasets using MATLAB. During our graduate studies, we found the limitations of conventional CPUs were resulting in slow calculation times for our algorithms, sometimes taking days to run. To resolve that problem for our own use, the four of us started to collaborate on a method for running MATLAB code on a graphics processing unit (GPU). This collaboration then evolved into a commercial product called Jacket, a full production version of which was launched in January of this year.
We chose NVIDIA's Tesla GPU and CUDA environment as our target platform for Jacket as it was clearly the most functional and robust option. The GPU power computing market is still relatively young and the few competing options were clearly too immature or too low level.
We've had tremendous support from both The MathWorks and NVIDIA in developing Jacket, which has appreciably reduced our time to market. Our objective has been to screen MATLAB users from the GPU environment so that running Jacket is transparent for them and they can just focus on their core research. All they need to use are four MATLAB GPU functions that allow them to transfer MATLAB calculations to and from an NVIDIA GPU to benefit from its parallel processing power and thereby considerably reduce their overall processing times.
Obviously there are a lot of MATLAB users in finance and many of them have a need for parallel processing. As a result we've seen significant interest from the finance sector in GPU computing. In addition to the performance increase, many of them are also attracted by some of the other associated benefits - such as lower power and rack space requirements than conventional CPU computing.
Anthony Tassone, VP Algorithmic Trading Solutions, RTS Realtime Systems
Our decision to partner with MATLAB was driven by the direction in which our clients were (and still are) going. An increasing number of them have been looking to separate out their analytics from their execution. This is primarily in the interests of efficiency; a trading engine has a mixture of tasks to perform, some of which are extremely time sensitive and some less so. For example, you don't typically need to know the gamma of an option a hundred times a second. As a result, you don't need or want to put the calculation for that gamma into your execution algorithm, as it will unnecessarily slow the overall trading process up and impact performance.
Therefore we have focused on establishing a bidirectional link between RTS Tango and MATLAB that facilitates the optimal exchange of data and instructions. For instance, MATLAB and Tango servers can sit alongside each other in a data centre and the Tango execution server can grab the calculated values it needs from the MATLAB server only when it needs them.
Most things you can do in the Tango client you can do in the MATLAB client, including coding trading strategies, so you have the flexibility to always execute the right tasks/code in the most appropriate place. In addition, it is possible to control Tango from MATLAB, so trading strategies can be turned on/off and new strategies compiled on the fly. This extends beyond individual markets, so users can also build risk management applications that control multiple Tango instances across a range of different trading venues.
The overall objective is that traders and quants can focus on developing ideas and models in MATLAB and then be able to leverage them on Tango. In an ideal world they will need to spend as little time as possible thinking about efficient execution (which they should be able to take for granted) and instead concentrate on adding value through exploring new trading ideas.
As mentioned earlier, a large selection of toolboxes and add-ons are available for MATLAB. There simply isn't space to deal with them all here, so only elements of those toolboxes likely to be of use to someone migrating to MATLAB from a more traditional analytics platform are covered. Therefore if you're burning to know Automated Trader's take on the Bioinformatics Toolbox, our apologies for any disappointment.
For those migrating from charting-based development platforms, the Financial Toolbox offers some familiar territory. The Financial Time Series GUI provides simple charting functionality and can also apply a variety of analytics (including common technical analysis studies) to displayed data. There are also various utilities for handling and converting dates, which work well. Useful - from bitter experience, one always seems to spend too much time manipulating between date formats and splitting out intraday time stamps; anything that eases this tedious drudgery is welcome.
The toolbox has some handy utilities to deal with the time consuming side of data housekeeping. Among many others, these include time frame rescaling, interpolation techniques for missing data values, differencing and data subset extraction (handy for in/out of sample testing). The descriptive statistics provided cover all the usual items, such as min, max, mean, covariance, standard deviation, correlation etc. Where appropriate, the calculations will automatically ignore non-numbers (NaNs).
The investment performance metrics in the Financial Toolbox include standard items like Sharpe ratio and risk adjusted return. However, a quick search of MATLAB Central reveals downloadable code for a variety of other metrics, including Sortino and Omega ratios. The Econometrics Toolbox offers more than sixty functions ranging from GARCH to multiple time series modelling to price/return utilities. While some of the functionality may be of more interest to economists than those building trading models, those looking to test for stationarity of pairs and synthetic pairs have their bases covered by a selection of Augmented Dickey-Fuller and Phillips-Perron unit root tests.
The Statistics Toolbox packs in several hundred functions ranging across probability distributions, regression analysis and hypothesis tests. As with the Econometrics Toolbox, an appreciable proportion of these will probably remain unused by many traders, but others such as analysis of distribution will be useful when evaluating model performance stats.
Several other toolboxes will be of relevance for those looking to extend their trading techniques. For example, the Optimization Toolbox has obvious application when testing model parameters and the Wavelet Toolbox for building filter applications for time series.
Get the trades out
One gripe we have heard from readers already using MATLAB is that while it delivers well as a development and testing environment, it doesn't make the final connection as a trade execution tool. While that may be strictly true, it is a little like complaining that your Ferrari is no good at hauling concrete blocks; it has no pretensions to do this and was never intended to do so.
In practice this perceived gap is already being filled by a growing number of providers. Ernie Chan's article over the page illustrates one MATLAB user's approach to this. At the same time, MATLAB's third party provider program (see sidebar 'MATLAB Third-Party Products & Services') has encouraged trading platforms such as RTS Realtime Systems to closely integrate their products. As mentioned earlier, CQG has also made code libraries available that allow users to quickly establish data and trade execution connections between MATLAB and CQG. This trend obviously benefits end users and is only likely to accelerate as other data and trade execution providers wake up to the potential of connecting their applications to MATLAB.
Where these connecting vendors offer charting and trading platforms of the type described in the 'Ease versus power' section above, there is potentially an additional productivity benefit for those looking to use MATLAB. For traders intending to explore trading strategies with a more quantitative bias, a well-integrated combination of traditional chart based products and MATLAB is potentially ideal in terms of shorter time to productivity. Existing proprietary study or function outputs can be exported straight into MATLAB, rather than the studies or functions themselves having to be recoded for MATLAB at the outset. In the long term, recoding may be the more efficient route, but if sufficient interoperability is available, it isn't immediately mandatory.
In addition, the integrated charting interface of traditional software is typically good at allowing users to 'eyeball debug' logic errors. (For example, a limit order profit target inadvertently set twenty big figures away from the intended level.) Again, this is something that is perfectly reproducible in programs such as MATLAB, but if interoperability means that it doesn't have to be immediately recoded from scratch then there is both a comfort and speed benefit.
From a 'How quickly can I do anything useful?' perspective, MATLAB looks good. While one could spend a lifetime exploring its every last esoteric nook and cranny, it is easy to get the basics up and running. Apart from the excellent documentation, webinars, seminars and sample code already mentioned, its basic method of operation is simple to grasp. Anyone who has experience with VBA or the macro languages used by many charting-based trading platforms should start being productive within a week. Those without such experience shouldn't take much longer. Even if you can't find the answer to a specific question in the help files, the size of the user community makes it reasonably likely that someone has already written the trading-related function you need (or one you can adapt) and published it on MATLAB Central or any one of the numerous other MATLAB-focused sites on the Web.
MATLAB pricing is not what you might call trivial, but given the level of functionality available it certainly isn't unreasonable either (see Figure 10). The prices in Figure 10 are for perpetual licences, but if you want support and upgrades after one year you have to subscribe to The MathWorks Software Maintenance Service. This isn't exorbitant; for example, a one year subscription for MATLAB (without any toolboxes) is GBP252 and most toolbox annual subscriptions are GB69.
So is MATLAB good value? From the perspective and intended
demographic of this review, I think
the answer has to be a resounding yes. OK, the
price of the core MATLAB application isn't exactly chickenfeed, but you do get a lot of functionality. The same applies to the toolboxes, but the slight snag from the perspective of someone expanding their trading methods and using MATLAB for the first time is deciding which toolboxes are likely to be relevant at the outset.
One obvious solution is to take a one month free trial. Another is to try the free toolbox from www.spatial-econometrics.com, which also contains equivalent functions to those in the MATLAB Statistics and Finance Toolboxes. As it is written in M-Code, it is MATLAB-compatible - though the help/support functionality is understandably smaller scale.
Whichever route you take, the fact remains that MATLAB has the potential to support just about anything you could conceivably wish to achieve in terms of building and testing (and through third party vendors, also deploying) trading models. We like it…