"Before I got into this, I didn't realise how long I would spend preparing my data versus how long I would spend doing the cool algorithm part."
That's what one client told Tanya Morton, an application engineering manager at MathWorks. Morton said that it's for that very reason -- because data cleansing can be so time-consuming and require so much fiddling -- that many companies should probably be thinking about investing more to improve their processes. In other words, spending a lot of time and energy on data cleaning quickly gets costly, so an investment upfront can save money later.
The causes of faulty data are multiple, from having a hodgepodge of different systems to relying on the wrong tools for data warehousing to business culture factors. The methods for ensuring you do have clean data are similarly numerous. Technology has made a difference in allowing you to spot rogue or missing data, but the view from data specialists is that if you want clean data there ultimately is no way to avoid wading in and getting your hands dirty.
"People think that with all the high technology available now, how difficult can it be to clean the data? And the answer is extremely," said Simon Garland, chief strategist at Kx Systems. "And it's not what people want to hear. It's laborious and of course it's expensive. They keep hoping there's some shortcut."
Whatever the causes and whatever the solutions, consultants and data experts agree that the impact from not putting in sufficient effort to clean your data can be substantial. Whether it's for back-testing a new model, transaction cost analysis or satisfying compliance and regulatory requirements, data cleanliness is paramount and insufficient attention to that need can cost you money.