On how to do a little bit of data management
I’ve been fascinated by some of the reports that have been coming out of the Revenue and Customs “Lost Data” affair. In particular, the reason why so much data has been lost was because they couldn’t afford to just send the bits that the audit office wanted has, as someone who used to work in a data industry, struck me as ludicrous.
Let me explain. The Audit Office wanted a file with just a few simple records for each person on the database. HMRC said that it was much too expensive to do that and it would need to send the whole file. But that doesn’t make sense. Imagine if you will, that all the records that the HMRC want are on a piece of paper several miles long. On the top row of the piece of paper is a list of the different pieces of information – Name, Address, NI Number, Parent etc. Then, underneath that are the 27 million odd records. The whole thing looks like a very long scroll with lots of rows of data, each relevant bit for each person in the column underneath the particular heading.
Now, the Audit Office says “Just send us the first three columns”. In terms of data manipulation this is easy-peasy. I say this as someone who in my career before vicardom at one point ran the retail credit-scoring database for a major high-street bank, and then in my consultancy days handled and manipulated the data of customers as varied as Barclaycard, Coutts & Co, Littlewoods and Ford Finance. If one of my clients had come and said “We need just these bits of data from the database”, I would simply have sat down at my desk, set up the extract (a 2 minute job) and then sat back and waited for the result.
A data extract from such a database is the equivalent of the computer simply sitting down and copying all the details onto a new piece of paper from our long data scroll we mentioned above. For a human it takes ages to do this but for a computer it’s practically instantaneous. In fact, an even better analogy would be that it’s like taking our long scroll and simply tearing off the columns of data we want from the scroll (though of course in the case of data extraction all the old data stays where it is).
When in the Bank we did a full data extract on our 11 million+ customers every month, including a whole series of calculations, sending the data via FTP to various locations, receiving appended data back, matching that new data back into our database, updating all the systems so that the data that was “2 months old” now became “3 months old” etc, it took us a whole weekend. But that was in 1996 and we were running a 250Gb server with hundreds of RAID array hard drives to do the storage, our memory was just creeping into the 1Gb area (and was b****dy expensive – hundreds and thousands of pounds) and we were running on the latest Pentium I Intel chips. By the time I moved into the consultancy just a couple of years later I was using a single PC to do the equivalent work overnight. These days my laptop I’m typing this on has more hard drive space, more memory and a much faster processor then the beast of a box (it filled a whole room) that I used a decade ago at the bank. To extract 25 million records from a file on this PC would probably take a few hours max. And cost wise, the work is 5 minutes setup and 10 minutes at the end to encrypt the data.
We need to ask the Government to explain exactly what the data extract that the Audit Office wanted was (not the data items but where they were sourced from – how many different files etc), so we can then know how easy / hard a job it actually was to do and why they didn’t do it because it “cost too much”.
And this by the way, “it cost too much” is exactly the reason I trust Tesco and First Direct Bank with my data and not the Government. With a commercial organisation, the risks of mishandling data are too much to consider, so huge amounts of money and management time is spent on making sure that data is safe and secure. I have never worked in a mass-customer commercial environment where that is not true. For the government however, with cost-cutting everywhere if the money to be spent on a job (which as I’ve explained isn’t in the least bit complicated) is too much, the security and safety seem to go out of the window.
Which leads me to my final point. In a business, if your commercial unit loses data it’s because your management of data processes wasn’t good enough, so you resign. In government it seems that if your department makes a monumental mistake because you didn’t spend enough money to make sure things were safe and secure, you carry on regardless.
More government centralisation anybody?