The workflow was pretty straightforward: You execute your equity trades on various exchanges via FIX (using different brokers), you receive your fills and they go into a database.
At the end of the day, you reconcile your activity versus your various execution brokers to see whether they match what you think you executed. Finally the trades are sent downstream for clearing and settlement.
This process had been running for hundreds of days without any need for modification. Generally things worked smoothly. Or almost smoothly. Occasionally, some of the brokers summarized their quantities incorrectly on a security or two (e.g. showing the quantity executed to buy in AAPL as 33,150 instead of the true quantity of 33,550). This was presumably due to different systems with varying degrees of 'ancientness' that worked the order and then handled the fill information separately. Thus they had trouble propagating information from one system into another on occasion. Equity trading in the US was a particularly complicated maze and the brokers would need to consolidate execution information from a hundred different venues. Thus, it was inevitable that occasionally something got lost or arrived late. It didn't happen very often but it did happen. Maybe once every two or three weeks.
It always turned out that the quantity shown by the broker was wrong, proving the old adage that the customer is always right. The established practice at some point simply became that as long as it was only a single security that was off by a common order quantity, such as 100 or 200 shares, it was considered to be a problem at the broker. This sort of arrogance is pretty normal in finance. It's never us, it's them.
One fine spring day it happened again. A single security was showing that the firm had executed 200 shares more than what the executing broker was showing for the day.
Security Firm Executing Broker
INTC 19,943 19,743
SPY 15,070 15,070
Assuming that this was another one of those cases where "they are wrong", the decision was made to assume that the trade was real and the broker was missing it, as had been the case dozens of times before.
The trades went down the water slide for clearing and settlement.
Soon after, downstream processes started alerting middle-office personnel that somehow there was a mismatch. This led to further investigations and also flagged up possible compliance issues. As there were only 19,743 shares in inventory at the firm, it meant that the additional sale of 200 would have been a short sale, with all its associated requirements. At this stage people started roping in the compliance team just in case.
Looking at the trades in detail, it looked like there was a single execution of 200 shares that was missing from the broker. It was the same price and same quantity as some other executions, but there were many 200 share fills with this price (about 40 of them, in fact).
After running post-mortems on a number of logs, eventually it was established what had gone wrong. Upon an intra-session restart of the FIX connection with the broker, somehow the 200 share sale in INTC got re-sent. This single trade duplication had never seemed to have happened before and has not been observed since.
Now, why did it not get flagged as a duplicate right away? For starters, not every trading engine checks incoming executions for duplicates (this one did), nor is this necessarily a trivial thing to do. Most platforms, including trading venues, do not guarantee unique execution identifiers across the entire universe they trade. And reliable duplication detection would depend on being able to create a union over a number of more or less well-specified fields.
Even though the duplicate trade had the same execution ID as the original trade, it was not caught. Neither the trading engine nor the database detected it as a duplicate because other identifying fields were either missing or slightly different, allowing the trade to slip through.
FIX is a terrible protocol, and systems that process FIX messages are usually just as bad. So trying to really get to the bottom of where that trade came from is a mystery that is never going to be solved. (Nor would it be a good use of resources to spend many days or weeks trying to solve it). Better procedures are the answer.
There are a few things that went wrong as far as best practice goes:
The assumption that because the broker has been wrong in the past, they must be wrong now.
Duplication detection relying on a large union of fields to create what would constitute a unique reference. Some fields being empty/null or 'unknown' in a duplicate would break this detection. 'Softer' heuristics would be better.
What went right:
Complete FIX logs were available for inspection and replay.
Detection of the excess trade quantity (200 shares) occurred further downstream.
Compliance implications were recognized.
Eventually, after many hours of investigation, it was recognized that the trade was indeed merely a flitting shadow of another trade that came down the pipe. The quantities did match, nothing got missed, there was no short sale violation and all was well.
I'M HALF THE MARKET. LITERALLY.
It is every software developer's nightmare: Someone changes the data format that they use and they don't tell you. And/or you don't notice.
Now, in most cases (which is also the best case) your application simply crashes with a loud bang (or a more quiet one if you have decent exception handling).
And then there are the cases where things don't crash, and instead you are simply processing the wrong data and passing that along to the next layer of processing. And they don't notice. And then it goes to a trader. And he still doesn't notice. And then it gets transformed to an order and goes to the market. And you still don't notice. But Mr. Market, he obviously notices.
A market making firm had precisely this issue when the CME re-worked its currency trading to accommodate smaller tick sizes. Starting with MXN/USD, then JPY/USD and eventually ending up with EUR/USD futures.
Despite multiple notices going out in the weeks leading up to the change in June 2015 in JPY/USD, nobody noticed that their internal algorithm for converting Globex prices into spot prices relied on their own algorithm for converting floating point prices into their own integer format for pricing and distribution. Starting with this line:
int priceInteger = priceUnadjusted / (ScalingFactor * tickSize);
This works, and always will give you integer prices. Except that it won't give you prices that look anything like normal prices if the tick size for a future is not a power of 10 (e.g. 10, 1, 0.1). But it makes a number of things easier and faster, so this practice is in fact not as strange as it sounds.
If the JPY/USD futures price coming from the feed is 0.008532 and the scaling factor is 0.000001, then the integer price is:
8532 = 0.008532 / (0.000001 * 1)
So far, so good. Dividing by the old tick size of 1 gives you 0.008532. However, dividing by the new tick size of 0.5 gives you 17064:
17064 = 0.008532 / (0.000001 * 0.5)
Now we have a small integer price that we can throw around the network.
In spot F/X we don't trade JPY/USD, we trade USD/JPY. So we have to take the reciprocal of that price:
priceSpotQuotationEquivalent = 1/(scalingFactor * priceInteger * tickSize);
As an example:
117.2058 = 1/(0.000001 * 17064 * 0.5)
Now that we have a spot quotation equivalent price, we subtract the forward points until IMM expiration and we are done.
Doing the reverse, that is going back from an actual spot price to a futures price, we add the forward points, invert the price and we have got our futures price:
priceFuture = (1 / scalingFactor)/(priceSpotQuotationEquivalent * tickSize);
Except that the actual code in place was the following line:
priceFuture = (1 / scalingFactor) / priceSpotQuotationEquivalent * tickSize;
Same thing... Or is it? The division and multiplication symbols will be processed left to right. Missing parentheses.
Instead he got:
4266 = (1 / 0.000001) / 117.2058 * 0.5
A big difference! Basically an order of magnitude. We end up with a price that is about 100 handles off-market. In spot terms: 23.44 compared to 117.20.
This error in the programming logic was there for a long time. Except it never appeared because tick size was always 1.0, so the multiplication made no difference. Until the tick size changed. It is a subtle error but one with large consequences.
As a result, the application tried to send sell orders into JPY/USD futures at 4266 (0.004266) as the code for converting back into futures prices worked fine. As the actual market was more like 8532 (0.008532) this meant that sell orders were coming in at half the price of the actual bid/ask.
Fortunately, Globex rejected all the sell orders due to violating the price reasonability checks of the system (also known as "Price Banding" in Globex-speak). The buy orders would have gone through, but would never have had a chance to execute. Of course, if the order was part of an attempted arbitrage - which some of these were -, you just ended up with a big long position in USD/JPY spot…
What went wrong:
No unit tests in place and if there were, not enough test cases
No diagnostics/health checks on data (for this particular data)
Nobody noticed until the trading venue started rejecting orders en masse
What went right:
Nothing went right apart from the fact that it got fixed eventually!
Globex price reasonability checks prevented the worst.