The Gateway to Algorithmic and Automated Trading

Ain't life a glitch?

Published in Automated Trader Magazine Issue 31 Q4 2013

Automated Trader investigates the various responses, both from the industry and regulators, to a spate of technological mishaps. Adam Cox reports.

The word seemed to take hold in the 1950s, during what was then known as the space race. Some say it came from the German glitschen ("slip" or "slither"). Others contend it stems from the Yiddish gletshn ("slide" or "skid"). Still others say the real origin is unknown.

Whatever the etymology, the word "glitch" is used an awful lot these days - at least in the world of financial trading. A search on Google Trends shows that "market glitch" as a search term spiked higher in May 2010 (no prizes for guessing what caused that), and since then has been a fairly common search, with the occasional mini-spike such as after the Knight Capital meltdown in August 2012 and the NASDAQ disaster a year later. Prior to 2009, the term barely registered in the Google Trend metrics.

The NASDAQ debacle - which knocked out the exchange for three hours - represented the biggest exchange glitch in modern history, but it was by no means the only one. In recent months, a number of major exchanges have experienced one kind of glitch or another which halted trading. Some were small, like the one which hit Eurex. Others were more severe, like the NASDAQ affair, which caused some 2,700 securities to stop trading. Taken together, they've dented investor confidence and raised concern among regulators.

Mack Gill

Mack Gill, MillenniumIT

"If anything, I feel that the bar has been raised for everybody on testing - whether you're a technology provider or an exchange or trading venue - and just ensuring visibility of that as an operator."

But what can be done? Does the solution lie in better technology or is it a question of better crisis management and communication? Whenever mishaps happen, everybody talks about "lessons to be learned", but there is plenty of debate about what those lessons really are. And should regulators act as taskmasters, forcing the issue through? So far, regulators have trod a fine line. They've convened forums, both after Knight and after NASDAQ. And they've even made demands (such as the Securities and Exchange Commission's order to exchange heads discussed below). But as yet there has been little talk of regulators taking on a more activist role in defining standards for exchange system technology - a point many market participants no doubt welcome.

If there is one thing that seems certain, it's that there will be more problems.

"As sure as computers and programs have had technical glitches in the past, I believe there will be glitches in the future," said Commodity Futures Trading Commission (CFTC) Chairman Gary Gensler addressing a recent meeting of the CFTC's Technology Advisory Committee. "That's just the nature of the reality we're in, and thus, I think we have to look to risk controls and system safeguards to protect markets when such glitches inevitably occur again."

Mack Gill, the new chief executive at exchange technology group MillenniumIT, agrees that the heat is definitely on. "If anything, I feel that the bar has been raised for everybody on testing - whether you're a technology provider or an exchange or trading venue - and just ensuring visibility of that as an operator," Gill told Automated Trader.

But before launching into a gripe-fest about all the woes on exchanges of late, it may be worth putting things in historical perspective.

Bob Algeo

Bob Algeo, Algeo Group

"They're pretty efficient. I would give them an A-minus."

Bob Algeo, a co-founder of the Algeo Group LLC has been in the market since 1967.

"I was on the AMEX floor during the Crash of 1987. I remember that well, when in October of '87, options would open for trading and then, due to an order influx, they'd stop trading. And then 15 minutes later they'd start trading again... And maybe they stopped trading five times as the market crashed," he said. "That doesn't happen anymore."

Overall, Algeo said, he thinks the exchanges work hard and do a good job: "They're pretty efficient. I would give them an A-minus."

But he also recognises that when things go wrong - particularly if it concerns a market that is only tradable on one venue - it can lead to big problems for investors: "You're a slave to their efficiency. So if the market stops, if you have a position that really needs some immediate action, you've got a problem. You can go to futures, you can do other things, you can trade similar contracts somewhere, but you have a problem. You might make money or you might lose money, depending on what happens over those hours."

The regulatory response

The old adage states that an ounce of prevention is worth a pound of cure. But in the case of technology glitches, while there can be a tremendous amount of investment on the "prevention" side, it's widely acknowledged that ultimately prevention can only go so far. In other words, there is no way to build software systems that don't break down. The SEC heard as much in the aftermath of the Knight meltdown, when it brought in a series of experts to talk about how to address technology failures. One of the most outspoken at that meeting was someone with no real financial experience. Dr. Nancy Leveson, professor of aeronautics at MIT, said the only way to think about technical systems was to expect that software would eventually encounter bugs: "While I'm not suggesting that anyone shouldn't use the highest standards, it's not going to be enough," she told the group. "I wish it were. It's not."

The task for venues and regulators, therefore, is as much about responding to crises as to trying to prevent them, "It's a matter of making sure that clearinghouses, trading venues, and data repositories are robust and resilient enough so that when somebody has a glitch or fails that we ensure that the central functioning of the markets continue," Gensler said.

The "electronification" of markets has been dramatic. The CFTC recently noted that more than 90% of futures trading is now electronic. The trend has brought massive benefits in terms of speed and efficiency, as well as the fact that these trades now always have a paper trail (an ironic term in this context). But it has also brought its share of headaches. In the US, where the glitches have made the biggest headlines, officials have not been slow to respond to the latest problems.

Not long after the NASDAQ failure, the SEC called a meeting with exchange heads as well as officials from DTCC, FINRA and the Options Clearing Corporation. SEC Chairman Mary Jo White told exchanges in mid-September to draw up "comprehensive action plans" for ensuring their systems were robust. They had two months to identify "concrete measures" that would address areas where system resilience could be improved. "The investing public deserves no less," White said.

US Securities and Exchange Commission

US Securities and Exchange Commission

Exchanges were instructed to:

  • Provide plans that address standards for securities information processors (SIPs)
  • Provide assessments of the robustness of other critical infrastructure systems
  • Provide SIP plan and/or rule amendments on the communication of regulatory halts
  • Review rules relating to trade break processes and procedures to reopen trading after halts, and provide rule amendments as needed
  • Provide rule amendments to implement "kill switches" in the event of technological failures, and consider other potential risk mitigation mechanisms

The tone of the meeting was said to be constructive, a noteworthy point considering that NYSE and NASDAQ had been at odds over the cause of the NASDAQ failure.

Meanwhile, in a more holistic move that looks at all aspects of automated trading, the CFTC that same week issued a concept release, asking for extensive feedback to help it formulate safeguards for automated trading.

It has already taken some steps. In June 2012, it adopted final rules for designated contract markets (or DCMs, a technical term that includes exchanges) including requirements that they establish and maintain risk control mechanisms to prevent and reduce the potential for price distortions and market disruptions. Many of the requirements focus on avoiding disruptive trading, rather than technological problems. These include trading pauses and halts under conditions prescribed by the DCM. The final rules also require risk control requirements for exchanges that provide direct market access.

The CFTC is of course very interested in how exchanges can or should react to technological malfunctions. In automated trading systems (ATSs), the speed of processes has necessarily shifted risk management functions to parallel automated risk management systems acting with equal speed, the CFTC has noted. In its concept release, it considers the role of manual processes in order to shut down systems that are malfunctioning.

"In automated trading, humans design and test ATSs, establish decision criteria, manage implementation, and intervene when technology systems fail. ATS designers must identify the range of market conditions that an ATS could reasonably face, and determine the range of permissible responses by the ATS to each condition," the release said.

"ATS operators, in turn, must be prepared to intervene when market conditions are outside of an ATS's design parameters, when an ATS's trading strategy must be modified, or when an ATS appears to be malfunctioning and must be shut down. Rapid decisions must be made while simultaneously digesting large quantities of information regarding multiple, fast-moving markets."

The CFTC has mooted the idea of ATS monitoring and supervision standards, as well as pre-established crisis management protocols. These, it said "could help ensure that human supervisors intervene quickly when ATSs experience degraded performance, and that supervision staff have the both the authority and knowledge to intervene as required...

In addition, the Commission believes that change management standards that are beneficial to ATSs could also be applied to trading platforms to help prevent operational or programming errors in that element of the automated trading environment."

Another area the regulator is interested in exploring further: execution throttles, which prevent an algorithm from exceeding its expected message rate or rate of execution, and when tripped, can alert monitors at both the exchange and the trading firm. "Such alerts can facilitate rapid detection of malfunctioning algorithms. Depending on the nature of the malfunction, execution throttles may also reduce the damage and monetary losses caused by the disruptive algorithm during the time when it is being investigated."

A failure to communicate

Whatever the exchanges come up with, and whatever the regulators require, one area that has come in for some of the sharpest criticism has been how exchanges communicate.

The New York Times, citing people briefed on the matter, noted that NASDAQ did not immediately notify the SEC when the August 22 problem surfaced. It did open a phone call with their employees and with employees from other exchanges soon after the problem was detected, a conference call which lasted through the entire shutdown. But the newspaper said one person on the call said NASDAQ provided few details on what had happened and what they were doing to deal with the problem.

Bob Greifeld

Mary-Jo White, SEC

Mary-Jo White of the SEC has called on exchanges to take action and review their systems and procedures. Still, many in the market favour self- policing since it's clearly in the exchanges' own interests to address their problems.

Robert Greifeld, chief executive of the exchange, appeared on CNBC the next day explaining what happened and defending the way the exchange communicated. NASDAQ separately became embroiled in a dispute with NYSE as to what caused the glitch and Greifeld's comments characterised the issue as one caused by elements outside of NASDAQ's control.

"What happens is, we have a data feed which consolidates the trading for 13 exchanges. We do that for the industry. That had a problem, and as soon as we saw that had a problem we had a fundamental concern. We knew professional traders had access to individual trading feeds but the traditional long investor, retail investor, now didn't have the same information. Because of that we halted the market," he said.

Greifeld brushed off the communication criticisms and said what NASDAQ needed to get better at was what he called "defensive driving", which meant reacting better when things did go wrong, as they inevitably would. "There will always be issues within the ecosystem. We don't live in a monopolistic world today. We have a lot of competitors, 13 exchanges."

Asked about whether it would be better to have an additional pipe, Greifeld said he couldn't argue with the idea of more than one consolidator. "But I do want to highlight the fact that in the world we live in, defensive driving is an important skill. We will get better at it. We spent a lot of time and effort coming up with scenarios where other things happen outside of our control and how we respond to it and this was an example of that."

The idea of redundancy, notably, was highlighted in the SEC response.

No shortage of advice

There is no shortage of people ready to offer advice on how to build better systems. In the post-Knight SEC roundtable, Jonathan Ross, chief technology officer of GETCO (which later acquired Knight), offered three ways of minimising the impact of the errors that would inevitably come:

  • To the extent possible, systems should be independent from other systems to limit the potential for a problem to cascade
  • Making smaller, incremental changes to a system to reduce the magnitude of any errors and make it easier to mitigate the impact of such errors if they do occur
  • Using multiple, overlapping levels of preventive or protective risk controls that each look at a system independently

In that same roundtable, Jamil Nazarali, head of Citadel Execution Services, offered detailed advice for exchanges, but the focus at that time was about how exchanges and regulators could help prevent or minimise the impact of errors by market participants such as Knight, not on how they could ensure their own system integrity.

Gill of MillenniumIT said the underlying technology may be a huge factor, but he pointed to a wider array of issues beyond the nuts and bolts. This included how different firms worked together.

"You have to rate a venue based on its operational excellence. And the technology is part of that," he said. "But then on a wider basis, given the huge increase in the number of participants, given the level of speed that we're seeing, I think that what we need to have is a much wider industry focus on STP," Gill said.

STP - or straight-through-processing - is a term that's been around for so long that it's almost lost its luster, he said: "But in fact, you need to look at industry-wide STP, and make sure all the market participants are talking to each other. We're testing together and we're making sure that we're much more resilient overall."

Gill said having systems with the requisite uptime and resiliency ultimately came down to engineering: "It's as simple and as complex as that. It's about solid engineering, and that's both in the software side and also when you look into the blur that we have now in the industry between software and hardware, tuning and programing."

What does Gill make of the argument that sometimes there's been too much of a rush to market?

"I have an inherent bias towards good software engineering. And there's lot of software that's developed out there and it's not all the same quality," he said. "You have to have absolutely top-notch expertise to optimise, to make sure it works and to make sure it continues to work. So in terms of the rush to market, yes, I think sometimes there's perhaps undue pressure to get things out there," he said. "Knock on wood, the industry will have fewer issues than we've seen over the past year."

The SEC's White, for one, will be expecting more than knocking on wood. But while Gill was hardly suggesting firms rely on chance, there appears to be no question that bad luck can play a part in a marketplace that grows more complex by the day.

Resilience and reliability will no doubt become an increasingly important differentiator for venues. But so will crisis-response.

A catalogue of errors

November 2010

Untested code changes implemented by a US stock exchange operator resulted in errors within its trading platforms. The platforms overfilled orders in more than 1,000 stocks, resulting in $773 million of unwanted trading activ- ity (CFTC)

March 2012

A software problem on BATS Global Markets, whose software had under- gone testing, led to a disruption of the exchange's own IPO. The glitch caused opening orders for ticker symbols be- ginning within a certain letter range to become inaccessible on the platform. Once the system failed, circuit breakers were triggered and erroneous trades were cancelled. (CFTC)

May 2012

Facebook's IPO experienced significant problems as a result of technical errors on NASDAQ OMX Group's US exchange. Many customer orders from both institutional and retail buyers were unfilled for hours or were never filled

at all, while other customers ended up buying more shares than they had intended.

November 2012

The NYSE suffered an outage in a new matching system that caused its share of volumes to fall about 50% due to problems transitioning to a new match- ing engine called "universal trading plat- form". Brokers routed orders through other exchanges, from BATS and NASDAQ to Direct Edge and NYSE's own Arca exchange (Forbes)

July 2013

Federal prosecutors charged five men with hacking, including a security breach against NASDAQ. The US At- torney's office in Manhattan announced indictments against a man charging

he hacked servers used by NASDAQ from November 2008 through October 2010. It said he installed malicious soft- ware that let him and others execute commands to delete, change or steal data. (Reuters)

Aug 22, 2013

A software glitch on NASDAQ led to a three-hour stoppage that forced some 2,700 securities to cease trading. NAS- DAQ's feed providing prices for stocks and ETFs failed after suffering connec- tion problems with a NYSE exchange, revealing a single point of failure in a system that has been criticised for being too broadly dispersed across dozens of trading venues. (CFTC)

Aug 26, 2013

A technical glitch halted trading on Eurex for about an hour due to an incorrect time synchronisation. Eurex owner Deutsche Börse said trading was halted "in order to protect the integrity of the market".

Aug 27, 2013

The Chicago Board Options Exchange issued a number of notices, one say- ing "some users may see delays in complex order entry", another saying its futures platform had problems with a limited number of cancelled trades, and one saying the futures exchange would switch to a backup data centre for 200-400 milliseconds, leading it to pull market quotes from the system. A spokeswoman said trading was never interrupted and just 10 trades had to be cancelled due to problems.

(Wall Street Journal)

Sept 4, 2013

Another embarrassment for NASDAQ as it suffered a six-minute outage due to a problem with a feed that broad- casts price quotes for some securities. The issue lasted from 11:35 am to 11:41 am Eastern time, according to a NASDAQ trading notice, and was fixed by 12:05. (WSJ)