In 1958, a German-born computer scientist for IBM wrote about his ideas for a tool that would make sense of incoming text based on word patterns. A little more than half a century later, the social media giant Twitter struck a deal to provide its entire archives to the US Library of Congress.
The two events are major milestones for the algorithmic trading community, points on a chart that depicts the market's remarkable progress towards being able to make sense of unthinkable amounts of information in the blink of an eye. The first event represented the starting point in a race to create text-reading machines; the second marked a development that would ultimately allow those machines to make use of billions of online conversations. Along the way, market participants, technologists and information providers have been engaged in a mad scramble to create systems that could consistently outwit the very best that human traders can achieve. The results of the past several years suggest the mad scramble is paying off.
Until recently, much of the focus has been on making sense of news.
"News is the classic big data problem," said Brian Rooney, global business manager for Bloomberg core news products. "You've got this flood of incredibly valuable information that starts at least largely unstructured, and the great art and value is in structuring the content, to make it easier for both machines and for humans to make sense of, and to ultimately act on."
But increasingly, market participants are adding social data to their models and vendors are looking at better ways to handle it. Should quants be worried about Beyoncé and Justin Bieber? Probably not much. But for the next generation of quants, the Twitter deal could mean all the difference when it comes to building robust, cutting-edge models.
A company in the middle of the Twitter library deal - a niche group called Gnip - entered into arrangements to pipe Twitter data wherever it was wanted. That meant that billions of tweets about virtually anything happening around the world could be archived, normalised and used for backtesting.
"They asked us to partner with them to help the Library of Congress manage that data," said Seth McGuire, director of business development at Gnip. "So we have the full Twitter historical corpus. We've spent a lot of time over the past year working with it to normalise it, clean it, manage it and create an infrastructure that allows use of it."
Social media firms, it turned out, had been focusing on immediate communication but had not given so much thought to capturing, categorising and packaging all of that content they were generating.
Who is using all this data and what are they doing with it? The answers to those questions cast a spotlight on one of the most dynamic and fast-growing areas in the financial markets.