Here is an interview with Steve Sarsfield, data quality evangelist and author of Data Quality Imperative.
Ajay- Describe your early career to the present point. At what point did you decide to specialize or focus on data quality and data governance? What were the causes for it?
Steve- When I was growing up, not many normal people had aspirations of becoming data management professionals. Back in those days, we had aspirations to be NFL wide receivers, writers, and engineers,and lawyers. Data management careers tend to find you.
My career path has wandered through technical support, technical writer and managing editor, consulting,and product management for Lotus development. I’ve been working for the past nine years at a major data quality vendor – the longest job I’ve had to date. The good news is that this latest gig has given me a chance to meet with a LOT of people who have been implementing data quality and data governance projects.
When you get involved with the projects, you’ll begin to realize the power it has. You begin to love data governance for the efficiencies it brings, and for the impact it will have on your organization as it becomes more competitive.
Ajay- Some people think data quality is a boring job and data governance is an abstract philosophy. How would you interest a young high school /college student, with the right aptitude, in taking a business intelligence career and be focused on it.
Steve- In my opinion if you promote a geeky view of data governance the message will tend to fall flat. If there’s one thing I have written most about, it is about bridging the gap between technology and business.Those who succeed in this field now and in the future will be people who are a bit of a jack-of-all-trades.
You need to be a good technologist, critical thinker, marketer, and strategist, and you need to use those skills every day to succeed. Leadership skills are also important, especially if you are trying to bootstrap a data governance program at your corporation. Those job attributes are not boring, they are challenging and exciting.
In terms of being persuasive about getting involved in a data career, it’s clear that data is not likely to decrease in volume in the coming years, quite the contrary, so your job will have a reasonable amount of security. Nor will there be less of a need in the future for developing accurate business metrics from the data.
In my book, I talk about the fact that the decision of a corporation to move toward data governance is really a choice between optimism and fear. Your company must decide to either be haunted by a never-ending vision that there will only will be more data, more mergers and more complexity in the years to come, orthey will decide to take charge for a more hopeful future that will bring more opportunity, more efficiency and a more agile working environment. When you choose data governance as a career, you choose to provide that optimism for your employer.
Ajay-What are the salient points in your book Data Governance Imperative. Do you think data governance is an idea whose time has come.
Steve-The book is about the increasing importance of data to a business. As your company collects more and more data about customers, products, suppliers, transactions and billing, it becomes more difficult to accurately maintain that information without a centralized approach and a team devoted to the data management mission.
The book comes from discussions with folks in the business who are trying to get a data governance program started in their corporation. They are the data champions who “get it”, but are yet to convince their management that data is crucial to the success of the company.
The fact is, there are metrics you can follow, processes that you can put in place, conversations that you can have, and technology that you can implement in order to make your managers and co-workers see the importance of data governance. We know this because it has worked for so many companies who are far more advanced in managing their data than most.
The most evolved companies will have support from executive management and the entire company to define reusable processes for data governance and a center of excellence is formed around it. Much of the book is about garnering support and setting up the processes to prove enterprise data’s importance. Only when you do that will your company evolve its data governance strategy.
Ajay- Garbage Data In and Garbage Data Analysis Out. What percentage of a BI installation budget goes to input data quality at data entry center. What is the kind of budget you would like it to be.
Steve- I’m sure this varies depending upon many factors, including the number of sources, age and quality of the source data, etc. Anecdotally, the percentage of budget five years ago was near zero. You really only saw realization of the problem LATE in the project, after the first data warehouse loading occurred. What has happened over the years is that we’ve gotten a lot smarter about this, perhaps as a result of our past failures. In the past, if the data worked well in the source systems it was assumed that it would work in the target.
A lot of those projects failed because the team incorrectly scoped the project with regard to the data integration. Today we have the wisdom and experience to know that this is not true. In order to really assess our needs for data quality, we know we need to profile the data as one of the first tasks in the process. This will help us create a more accurate timeline and budget and ensure management that weknow what we’re doing with regard to data integration and business intelligence.
Ajay- Do you think Federal Governments can focus stimulus spending smarter with better input data quality?
Steve- Believe it or not, I’m encouraged by the US Government’s plan on data quality. To varying degrees,Presidents Clinton, Bush and Obama have all supported plans for greater transparency and openness. To accomplish that, you have to govern data. In Washington, many government agencies now have a Chief Information Officer. The government is recruiting leading universities like MIT to work toward better data governance in government. The sheer number of databases even within a single US government agencywill be a huge challenge, but the direction is good.
This year’s MIT Information Quality Symposium, for example, had a very solid government track with speakers from the Army, Air Force, Department of Defense, EPA, HUD, and National Institute of Health to name just a few.
Other than the US, it gets even cloudier. There are governments ahead of the US, like UK and Germany, and those who still need to catch up.
Ajay- Name some actual anecdotes in which 1) bad data quality led to disaster 2) good data quality gave great insights
Steve- There are certainly plenty of typical examples I always like the unusual examples, like:
A major motorcycle manufacturer used data quality tools to pull out nicknames from their customer records. Many of the names they had acquired for their prospect list were from motorcycle events and contests where the entries were, shall we say, colorful. The name fields contained data like “John the Mad Dog Smith” or “Frank Motor-head Jones”. The client used the tool to separate the name from the nickname, making it a more valuable marketing list.
One major utility company used data quality tools to identify and record notations on meter-reader records that were important to keep for operational uses, but not in the customer billing record. Upon analysis of the data, the company noticed random text like “LDIY” and “MOR” along with the customer records. After somework with the business users, they figured out that LDIY meant “Large Dog in Yard” which was particularly important for meter readers. MOR meant “Meter in Right, which was also valuable. The readers were given their own notes field, so that they could maintain the integrity of the name and address while also keeping this valuable data. IT probably saved a lot of meter readers from dog bite situations.
Financial organizations have used data quality tools to separate items like “John and Judy Smith/221453789 ITF George Smith”. The organization wanted to consider this type of record as three separate records “John Smith” and “Judy Smith” and “George Smith” with obvious linkage between the individuals. This type of data is actually quite common on mainframe migrations.
A food manufacturer standardizes and cleanses ingredient names to get better control of manufacturing costs. In data from their worldwide manufacturing plants, an ingredient might be “carrots” “chopped frozen carrots” “frozen carrots, chopped” “chopped carrots, frozen” and so on. (Not to mention all the possible abbreviations for the words carrots, chopped and frozen.) Without standardization of these ingredients, there was really no way to tell how many carrots the company purchased worldwide.
There was no bargaining leverage with the carrot supplier, and all the other ingredient suppliers, until the data was fixed.In terms of disasters, I’d recommend the IAIDQ’s web site – IQ Trainwrecks.http://www.iqtrainwrecks.com/ The IAIDQ does a great job and I contribute when I can.
Ajay- What are the essential 5 things a CEO should ask his CTO to ensure good data quality in an enterprise.
Steve- What a great question. I can think of more than five, but let’s start with:
1) What is poor quality data costing us?
This should inspire your CTO to go out and seek problem areas in partnership with the business and ways to improve processes.
2) Do I have to make decisions on gut-feel, or should I trust the business intelligence you give our employees? What confidence level do you have in our BI?
The CEO should be confident in the metrics delivered with BI and he should make sure the CTO has the same concerns.
3) Are we in compliance with all laws regarding our governance of data?
CEOs are often culpable for non-compliance, so he/she should be concerned about any laws that govern the company’s industry. Even in unregulated industries, organizations must comply with spam laws and “do not mail” laws for marketing.
4) Are you working across business units to work towards data governance, or is data quality done in silos?
When possible data quality should be a process that is reusable and able to be implemented in similar manner across business units.
5) Do you have the access to data you need?
The CEO should understand if any office politics are getting in the way of ensuring data quality and this question opens the door to that discussion.
Ajay- What does Steve Sarsfield do when not writing blogs and books.
Steve-These days, when I’m not thinking about data or my blog, I’m thinking about my fantasy football team and the upcoming season. I’ve got a ticket to the New England Patriots opening game vs the Buffalo Bills and I’m looking forward to it. On the weekends, you may find me playing a game of mafia wars on Facebook or cooking up a big pot of chili for the family.
Steve Sarsfield is a Data governance business expert, speaker, author of The Data Governance Initiative ( at http://www.itgovernance.co.uk/products/2446 ) and blogger at http://data-governance.blogspot.com/. Product marketing professional at a major data quality vendor and author of the book “The Data Governance Imperative”.He was Guest speaker at MIT Information Quality Symposium (July 2007 and July 2008), at the International Association for Information and Data Quality (IAIDQ) Symposium (December 2006) and at SAP CRM 2006 summit.