No storage for you
Storage space is expensive. Sure, it's cheaper now than it was 25 years ago (if we measure it in terms of currency units, say USD, per gigabyte) but then again we also produce and archive a lot more data. We also have more sophisticated means of having this data protected by resiliency and redundancy so if anything, the complexity of just archiving data certainly has gone up.
In the enterprise space, storage is (usually) centralized in a... wait for it... centralized location. This makes some sense as all the back-up and redundancy functions are then able to provide economies of scale at one point. Of course, administering the storage becomes much easier, as there is a single point of contact and someone who is supposed to have all the insights into the requirements needed.
The disadvantage of this setup is obviously bureaucracy. Request forms need to be filled in and sent to the central storage coordinator (or central IT administration generally). Specifications must be drafted and approved by various layers of middle management until eventually, after much back and forth between the masses new space is allocated.
In some cases this can take months. In other cases it never happens.
Most crucially however: there is no intimate connection between the people that supposedly manage the data and the people that actually need it. It's hard to develop an understanding of the importance of some data when you're not the one that it means a lot to.
This story begins on a normal business day a few years ago. It takes place in the technology division that supports trading inside a large German bank. One of the system administrators noticed a particular server that he hadn't looked at in a while. This server was collecting market data for the entire German equity exchange: Xetra. It seemed to be an older server as by now it had been storing several years of data. The files were neatly archived and compressed, one file for each day. This was a German bank storing German market data after all. There must be order, jawohl mein Herr.
As mentioned, the server was a few years old.. and by now the unavoidable was happening: It was running out of disk space. The sysadmin noticed this and poked around a little. Diligently he informed his superior, who in turn started requesting new storage to be assigned or a new server to be installed.
What follows is the transcript of this conversation:
Sysadmin: This server over here has almost no disk space left.
Supervisor: What does it do?
Sysadmin: It looks like it archives some kind of exchange data. We have one log file for each day.
Supervisor: When will it run out of space?
Sysadmin: It's just a guess, but maybe in two weeks time?
Supervisor: Let me call someone on the trading side.
Supervisor: Hey, we have some server here that is filling up with data from Xetra.
Trader: What kind of data?
Supervisor: I think it's log of market data and things like that. I'm not really sure.
Trader: I've not heard of it. If we want to look at historical data we just use Bloomberg.
Supervisor: So you guys don't use this data in trading?
Trader: I don't think so, but let me check with someone else...
[5 minutes later]
Trader: No, doesn't look like we use it. Like I said, we just use Bloomberg for looking up historical data.
Supervisor: Okay, thanks.
At this stage, and despite the trader's reaction, neither the supervisor nor the sysadmin want to delete the data. Not that they are not brave men, no, they realize that it's been running for several years. This probably means that there is some purpose to this seemingly random collection of data. And nobody in a bank wants to make decisions anyway, let alone any that lead to data loss. So they decide to supplicate to the central procurement division for more storage.
Supervisor: Hello there. We are running out of storage space on a server and we would like to either upgrade the hard drive storage or provision a new server with additional capacity.
Procurement Guy: Oi. That's currently tricky. We have one of those clampdowns on IT spending at the moment. Does it affect the business?
Supervisor: I'm not sure. These are log files of market data for the German stock exchange.
Procurement Guy: For what time period?
Supervisor: The last four years until now, basically.
Procurement Guy: And nobody uses it?
Supervisor: The traders say they just use Bloomberg.
Procurement Guy: Well, in that case I can definitely say that we are going to deny this request. If it doesn't affect critical business functions, we cannot spend money. It's that simple.
Supervisor: Sigh... okay.
What follows then is the inevitable destruction of many years of high-resolution market data. The supervisor explains to the sysadmin that their request for more storage capacity had been denied and they'll need to delete the logs. With a pang of anxiety - familiar to anyone who has ever condemned years of logs to the virtual abyss - the sysadmin executes the deletion.
The story doesn't end here of course. A bunch of files that nobody was using are expunged. Big deal you say.
The experienced reader can probably guess what happens next. Six months later the bank decides to really get into this whole 'high-frequency trading' business. A bunch of heads of departments decide to sit together and hammer out a plan. High-frequency trading requires data, they know that much. The person in charge of sourcing the market data eventually discovers that there is a server collecting market data for Xetra. He ends up talking to our hapless sysadmin from earlier.
Project Manager: Guten Tag. I'm looking for some data that we've been supposedly collecting on Xetra.
Sysadmin: Go on...
Project Manager: We are trying to consolidate all of the bank's high-frequency data. There supposedly exists a server that collects market data for Xetra. Are you familiar with this server?
Sysadmin: I have bad news for you... this data was deleted six months ago. Central IT Procurement denied the acquisition of more storage space and to prevent the system from filling up the entire hard drive... we... well, we deleted it.
Project Manager: You're joking, right?
Sysadmin: No Sir. I wish I was, but that data is gone.
The bank ended up having to re-acquire the entire data set from the exchange itself, at the cost of about a quarter of a million EUR (while the exchange was sympathetic to their 'accident'... business is still business).
What went wrong:
What went right:
Nowadays trading is a pretty global business. While you might not always have to trade something in places like Indonesia, Chile or Russia, you usually can do so without too much of a problem. Of course this assumes that you are prepared to pay eye watering fees and commissions.
This story begins across the Bosphorus in Istanbul. There is a relatively active equity trading market place there called 'Borsa Istanbul'. It's pretty typical as far as trading venues go: You can buy stocks, bonds and derivatives to your heart's content. It is not big, trading maybe a handful of billion USD per day, but hey, it all adds up.
As a global bank, you are sooner or later going to end up dealing with trades from Turkey because well... there is customer demand for it. Trading Turkish equities isn't really any different than trading equities in any other country. They too have ISINs, SEDOLs and so on. From the bank's perspective, once the trades are executed they need to be booked. In this case that happened by pulling in a list of trades in a file at the end of the day.
This .csv file looked like this:
2015-04-13; ADEL.IS; TRAADELW91T1; SATIN; 100; 20.40;...
2015-04-13; ADEL.IS; TRAADELW91T1; SATIN; 300; 20.60;...
2015-04-13; ADEL.IS; TRAADELW91T1; SATMAK; 500; 20.50;...
2015-04-13; THYAO.IS; TRATHYAO91M5; SATIN; 1200; 5.10;...
2015-04-13; THYAO.IS; TRATHYAO91M5; SATMAK; 600; 5.05;...
2015-04-13; THYAO.IS; TRATHYAO91M5; SATMAK; 100; 5.05;...
2015-04-13; THYAO.IS; TRATHYAO91M5; SATMAK; 800; 5.15;...
Pretty standard trade file so far. It contains a trade date, a RIC, an ISIN code, something described as SATIN and SATMAK, a quantity and a price. This should be easy enough to parse.
Apart from the self-explanatory fields, it's important to note that 'Satin' in Turkish means 'Buy' or 'Purchase', and 'Satmak' means 'Sell'.
A Turkish developer set to work to make sure these files got parsed into the global clearing system back-end for the bank. That should be straightforward as the clearing system had a very standardized, simple trade type that looked something like this:
public class EquityTrade
public DateTime TradeDate;
public string ISIN;
public string Side;
public decimal Quantity;
public decimal Price;
public DateTime SettlementDate;
/* a bunch of other fields that we can ignore */
So far, so good. It's not how I would structure it, but it's good enough.
Now the developer starts simply populating this structure from the fields from the above .csv file:
newTrade.TradeDate = ParseDateTime(fields);
newTrade.ISIN = fields;
newTrade.Side = fields;
newTrade.Quantity = ParseDecimal(fields);
newTrade.Price = ParseDecimal(fields);
Due to the use of localized strings ('Satin/Satmak') things are going to get interesting when this trade gets persisted to the database. Here is what that code looks like:
// Check first character of Side field
// It's a buy.
newDatabaseRecord.Side = 1;
// It's a sell.
newDatabaseRecord.Side = 2;
// Unexpected side encountered.
throw new ArgumentException("Encountered Unknown 'Side' Element: " + newTrade.Side);
So this code checks for the presence of either a 'B' or an 'S' (for Buy and Sell). Unfortunately, our Turkish developer has populated this with their Turkish equivalent: 'Satin' and 'Satmak'. And both of them start with 'S'.
The result was that every single buy trade on the Borsa Istanbul got converted to a sell trade. Reconciliation problems and customer complaints followed for the next week until the problem was eventually addressed by making sure that fields in Turkish got translated to English before going into the system. For highly standardized trade data, the source shouldn't even have been in the local language version (in this case Turkish). But in true financial services fashion, it was chosen to address the symptom, not the underlying cause.
What went wrong:
What went right: