The Gateway to Algorithmic and Automated Trading

Is ABC really as easy as 123? The world of NLP, according to StreamBase

Published in Automated Trader Magazine Issue 27 Q4 2012

Since this edition of Automated Trader is exploring the fast-evolving world of unstructured data, we decided to spend some time with complex event processing group StreamBase. Adam Cox asks their chief technology officer, Richard Tibbetts, how far the industry has come in adopting Natural Language Processing techniques in algorithms, and what the future holds.

Richard Tibbetts

Richard Tibbetts

Adam: Let's start by talking about some of the more interesting developments you've seen in the last few years in terms of tackling unstructured data as they relate to financial players.

Richard: First is the naïve or obvious exercise in using sentiment indicators to drive intraday and real-time trading. That was the initial noise in the market. But people have backed off from that, having realised that off-the-shelf sentiment technology is often not enough to drive real trading strategies, at least those simply based on news sentiment. My favourite example is that you have a news story about General Electric layoffs. Whether the tone of that story is positive or negative, it actually doesn't impact how the market's going to move. What you're actually looking for is something a bit more factual or a bit more specialised.

The first thing we saw is that people realised - and people have been realising over time - that you need to be more sophisticated in how you process unstructured data than just looking at straight sentiment. One of the things we've seen is people adopting Natural Language Processing-type technologies, categorising their own annotation on the news data. That has been an ongoing trend, but people start to realise that you need to create something with your own secret sauce to succeed.

Adam: In other words, so much has to do not just with how many hot and cold words it has and how it triggers an algorithm, but also what the market expectations are and the context you might not get from the data itself. Is that what you're talking about?

Richard: What we found is there's a whole spectrum of different trading strategies matching different goals. The simplest thing to understand and the most straightforward method is actually a lot like what a traditional human trader would do - acting on news. You see this most strikingly with news data coming from key financial indicators. If you're going to write a trading strategy based on a key financial indicator, you don't do it based on the sentiment of an article in the New York Times. You do it based on having set an expectation for the jobs numbers and then seeing what the actual numbers are, and then driving your trading strategy based on whether it met or exceeded your expectations.

Adam: With economic indicators, that's been done for years and that's not actually unstructured data. The main information vendors provide it even faster in structured form than in unstructured form.

Richard: Exactly, but it's a good example to understand. What's one of those trading strategies people are accomplishing with unstructured data? Well, actually, they're structuring it in that same way. They have an expectation of a CEO departure, or an expectation of layoffs, or expectations of a particular earnings number associated with a new product launch. Whatever the expectation is, you build a recogniser that is customised to look at news articles and pick out the specific quantitative information that you're expecting to see.

Adam: So there's a risk that Company X will fire its CEO. You believe it will have an impact on the share price and you set up your system to capture that. Are people building in comfort factors in terms of who the source is?

Richard: One of the things is, if you're going to build a recogniser for this sort of thing, you wouldn't want to build a completely generic recogniser for CEO departures and then run it on the whole web. What you'd actually do is build up a corpus of articles about executive departures from major news outlets that you happen to have in your data feed, and train your recogniser based on those. So you'd know how Bloomberg writes this and how Reuters writes this.

Adam: Are people having much success?

Richard: We have seen firms that are having success. The tricky thing with this is, it's not a fire-and-forget strategy. So everybody's always looking for the algo trading strategy that you just set up and let run and never have to babysit. Whereas this kind of strategy requires a human to actually constantly work on new hypotheses about executive departures or layoffs, or whatever the characteristic, is and then set up a real-time strategy based on those.

Adam: Is there an appetite for people who want these off-the-cuff ones, who every week do something different and are programming them almost on the fly?

The remainder of this article is only available to Paid Subscribers

Click here to purchase a subscription to Automated Trader

  • Copyright © Automated Trader Ltd 2018 - Strategies | Compliance | Technology

click here to return to the top of the page
content