Interview Dylan Jones DataQualityPro.com

Here is an interview with Dylan Jones the founder/editor of Dataqualitypro.com , the site to go to for anything related to Data Quality discussions. Dylan is a great charming person and in this interview talks candidly on his views.Dylan Jones

Ajay: Describe your career in science and in business intelligence. How would you convince young students to take more maths and science courses for scientific careers.

Dylan: My main education for the profession was a degree in Information Technology and Software Development. No surprises what my first job entailed – software development for an IT company!

That role took me straight into the trials and tribulations of business intelligence and data quality. After a couple of years I went freelance and have pretty much worked for myself ever since. There has been a constant thread of data quality, business intelligence and data migration throughout my career which culminated in me setting up the more recent social media initiatives to try and pull professionals together in this space.

In all honesty, I’m probably the worst person to give career advice Ajay as I’m a hopeless dreamer. I’ve never really structured my career. I fell into data quality early on and it has led me to work in some wonderful places and with some great people, largely by accident and fate.

I have a simple philosophy, do what you love doing. I’m incredibly lucky to wake up every day with an absolute passion for what I do. In the past, whenever I have found myself working in a situation that I find soul destroying (and in our profession that can happen regularly) I move on to something new.

So, my advice for people starting out would be to first question what makes them happy in life. Don’t simply follow the herd. The internet has totally transformed the rules of the game in terms of finding an outlet for your skills so follow your heart, not conventional wisdom.

That said, I think there are some core skills that will always provide a springboard. Maths is obviously one of those skills that can open many doors but I would also advise people to learn about marketing, sales and other business fundamentals. From a business intelligence perspective it really adds an attractive dimension to your skills if you can link technical ability with a deeper understanding of how businesses operate.

Ajay You are a top expert and publisher on BI topics. Tell us something about

a) http://www.datamigrationpro.com/

b) http://www.dataqualitypro.com/

c) Involvement with the DataFlux community of experts

d) Your latest venture http://www.dqvote.com

Dylan- Data Migration Pro was my first foray into the social media space. I realised that very few people were talking about the challenges and techniques of data migration. On average, large organisations implement around 4 migration projects a year and most end in failure. A lot of this is due to a lack of awareness. Having worked for so long in this space I felt it was time to create a social media site to bring the wider community together. So we now have forums, regular articles, tools and techniques on the site with about 1400 members worldwide plus lots of plans in the pipeline for 2010.

Data Quality Pro followed on from the success of Data Migration Pro and our speed of growth really demonstrates how important data quality is right now. Again, awareness of the basic techniques and best-practices is key. I think many organisations are really starting to recognise the importance of better data quality management practices so a lot of our focus is on giving people practical advice and tools to get started. We are a community publishing platform, I do write regularly but we’ve always had a significant community contribution from expert practitioners and authors.

I didn’t just want to take a corporate viewpoint with these communities. As a result they are very much focused on the individual. That is why we post so many features on how to promote your skills, search for work, gain personal skills and generally get ahead in the profession. Data Quality Pro has just under 2,000 members and about 6,000 regular visitors a month so it demonstrates just how many people are really committed to learning about this discipline as it impacts practically every part of the business. I also think it is an excellent career choice as so many projects are dependent on good quality data there will always be demand.

The DataFlux community of experts is a great resource that I’ve actually admired for some time. I am a big fan of Jill Dyche who used to write on the community and of course there is a great line-up on there now with experts like David Loshin, Joyce Norris-Montanari and Mike Ferguson so I was delighted to be invited to participate. DataFlux have sponsored our sites from the very beginning and without their support we wouldn’t have grown to our current size. So although I’m vendor independent, it’s great to be sharing my thoughts and ideas with people who visit their site.

DQVote.com is a relatively new initiative. I noticed that there was some great data quality content being linked through platforms like Twitter but it would essentially become hard to find after several days. Also, there was no way for the community to vote on what content they found especially useful. DQVote.com allows people to promote their own content but also to vote and share other useful data quality articles, blogs, presentations, videos, tutorials – anything that adds value to the data quality community. It is also a great springboard for emerging data quality bloggers and publishers of useful content.

Ajay- Do you think BI projects can be more successful if we reward data entry people, or at least pay more for better quality data rather than ask them to fill in database tables as fast as they can? Especially in offshore call centres.

Dylan- Data entry is a pet frustration of mine. I regularly visit companies who are investing hundreds of thousands of pounds in data quality technology and consultants but nothing in grass-roots education and cultural change. They would rather create cleansing factories than resolve the issues at source.

So, yes I completely agree, the reward system has to change. I personally suffer from this all the time – call centre staff record incorrect or incomplete information about my service or account and it leads to billing errors, service problems, annoyance and eventually lost business. Call centre staff are not to blame, they are simply rewarded on the volume of customer service calls they can make, they are not encouraged to enter good quality data. The fault ultimately lies with the corporations that use these services and I don’t think offshore or onshore makes a difference. I’ve witnessed terrible data quality in-house also. The key is to have service level agreements on what quality of data is acceptable. I also think a reward structure as opposed to a penalty structure can be a much more progressive way of improving the quality of call-centre data.

Ajay- What are the top 5 things that you can help summarize your views on Business Intelligence – assume you are speaking to a class of freshmen statisticians.

Dylan- Business intelligence is wholly dependent on data quality. Accessibility, timeliness, accuracy, completeness, duplication – data quality dimensions like these can dramatically change the value of business intelligence to the organisation. Take nothing for granted with data, assume nothing. I have never, ever, assessed a dataset in a large business that did not have some serious data defects that were impacting decision making.

As statisticians, they therefore possess the tools to help organisations discover and measure these defects. They can find ways to continuously improve and ensure that future decisions are based on reliable data.

I would also add that business intelligence is not just about technology, it is about interpreting data to determine trends that will enable a company to improve their competitive advantage. Statistics are important but freshmen must also understand how organisations really create value for their customers.

My advice is to therefore step away from the tools and learn how the business operates on the ground. Really listen to workers and customers as they can bring the data to life. You will be able to create far more accurate dashboards and reports of where the issues and opportunities lie within a business if you immerse yourself with the people who create the data and the senior management who depend on the quality of your business intelligence platforms.

Ajay- Which software have you personally coded or implemented. Which one did you like the best and why?

Dylan- I’ve used most of the BI and DQ tools out there, all have strengths and weaknesses so it is very subjective. I have my favourites but I try to remain vendor neutral so I’ll have to gracefully decline on this one Ajay!

However, I did build a data profiling and data quality assessment tool several years ago. To be honest, that is the tool I like best because it had a range of features I still haven’t seen implemented so far in any other tools. If I ever get chance, and if no other vendor comes up with the same concept, I may yet take it to market. For now though, two young kids, two communities and a 12 hour day mean it is something of pipedream.

Ajay-What does Dylan Jones do when not helping data quality of the world go better.

Dylan- I’ve recently had another baby boy so kids take up most of whatever free time I have left. When we do get a break though I like to head to my home town and just hang out on the beach or go up into the mountains. I love travelling and as I effectively work completely online now, we’re really trying to figure out a way of combining travel and work.

Biography-

Dylan Jones is the founder and editor of Data Quality Pro and Data Migration Pro, the leading online expert community resources. Since the early nineties he has been helping large organisations tackle major information management challenges. He now devotes his time to fostering greater awareness, community and education in the fields of data quality and data migration via the use of social media channels. Dylan can be contacted via his profile page at http://www.dataqualitypro.com/data-quality-dylan-jones/ or at http://www.twitter.com/dataqualitypro

Best of Decision Stats- Modeling and Text Mining Part3

Here are some of the top articles by way of views, in an  area I love– of modeling and text mining.

1) Karl Rexer – Rexer Analytics

http://www.decisionstats.com/2009/06/09/interview-karl-rexer-rexer-analytics/

Karl produces one of the most respected surveys that captures emerging trends in data mining and technology. Karl was also one of the most enthusiastic people I have interviewed- and I am thankful for his help in getting me some more interviews.

2) Gregory Piatesky Shapiro

One of the earliest and easily the best Knowledge Discoverer of all times, Gregory produces http://www.kdnuggets.com and the newsletter is easily the must newsletter to be on. Gregory was doing data mining , while the Google boys were still debating whether to drop out of Stanford or not.
Continue reading “Best of Decision Stats- Modeling and Text Mining Part3”

The Top Decisionstats Articles -Part 2 Business Intelligence and Data Quality

I am self convinced novice at business intelligence. I understand the broad concepts, understand reporting tools, and definitely forecasting tools. But the whole systems view baffles me enough. Fortunately I have been learning from some of the best writers in this field. Here in order of circulation are the top Business Intelligence articles.

Business Intelligence


1) Jill Dyche

http://www.decisionstats.com/2009/06/30/interview-jill-dyche-baseline-consulting/

Jill is a fabulously wise and experienced person with a great writing style. Here answers were some of the most educative I have seen in Bi writing.

2) Peter Thomas

http://www.decisionstats.com/2009/07/02/peter-james-thomas-bi/

The best of British BI is epitomized by Peter Thomas, and he is truly a European giant when it comes to the field. His worst weakness is a tendency to disappear when Test cricket is around- but that is

eminently understable. I can relate to the cricket as well.

3) Karen Lopez

http://www.decisionstats.com/2009/07/28/interview-karen-lopez/

Karen gives an excellent insight on creating mock ups or data models before actual implementation. She has worked on it for three decades and her wisdom is clearly visible here.

Data Quality

Data quality is such an overlooked and easy to fix issue, that I belive any BI vendor that builds the best, most robust data quality architechture will gain the maximum Pareto like benefits out of results. Curiously competing BI vendors will often compete on price, grahics appeal, etc etc, but the easy Garbage In Garbage Out rule is something they should consider. The Data Quality Interviews gave me an important tutorial in these aspects of data management.

1) Jim Harris

http://www.decisionstats.com/tag/jim-harris/

Jim is an one man army when it comes to evangelizing data quality and his OCDQ blog is widely read and cited.

2) Steve Sarsfield

http://www.decisionstats.com/2009/08/13/interview-steve-sarsfield-author-the-data-governance-imperative/

His excellent book is the one must read item that people in cost cutting corporations should buy especially if they are considering to go down the Davenport competing on analytics model.

( To be continued- Part 3 Modeling and Text Mining

Part 4 Social Media

Part 5 Humour and Poetry )

The Top DecisionStats Articles -Part 1 Analytics

I was just looking at my web analytics numbers and we seem to have crossed some milestones.

The site has now gotten more than 50,000 views since being launched in Dec 2007.

Thank you everyone for your help in this. More importantly the quality of comments has been fabulous. Since I am out of ideas for the rest of the week- here is a best of posts collection.
Here are some of the most favorite articles as measured by number of page views. I have personal fovurites as well, but these are just the ranks as per page views and how they measure up.

Top 5 Interviews

1) Interviews with SAS Institute leaders- I have found generally great professionalism from SAS Institute people. This is surprising because comin from an open source background, SAS is often looked as a big brother. I find that more of a perception and less of a reality as the company continues to innovate.

a) with John Sall, founder SAS Institute- This is really the biggest interview I did in terms of the person involved. To my surprise ( I wasnt expecting John to say yes) the interview was really frank, and it came very fast. The answers seem to be written by John himself.

Quote- Quantitative fields can be fairly resistant to recession- John Sall.

http://www.decisionstats.com/2009/07/28/interview-john-sall-jmp/

b) Interview with Anne Milley, Director, Product Marketing , SAS Institute- This is a favourite because it came very soon after the NYTimes article on R etc. One of my personal opinions is that the difference between great and good leaders is often the fact that great leaders are humble enough  to learn and then build on their strengths. It ran in two parts- and I was really appreciative of the in-depth answers that Anne wrote.

Quotes-

Analytics continues to be our middle name.

Customers vote with the cheque book.

Continue reading “The Top DecisionStats Articles -Part 1 Analytics”

Interview Gregory Piatetsky KDNuggets.com

Here is an interviw with Gregory Piatetsky, founder and editor of KDNuggets (www.KDnuggets.com ) ,the oldest and biggest independent industry websites in terms of data mining and analytics-

gps6

Ajay- Please describe your career in science, many challenges and rewards that came with it. Name any scientific research, degrees teaching etc.


Gregory-
I was born in Moscow, Russia and went to a top math high-school in Moscow. A unique  challenge for me was that my father was one of leading mathematicians in Soviet Union.  While I liked math (and still do), I quickly realized while still in high school that  I will never be as good as my father, and math career was not for me.

Fortunately, I discovered computers and really liked the process of programming and solving applied problems.  At that time (late 1970s) computers were not very popular and it was not clear that one can make a career in computers.  However I was very lucky that I was able to pursue what I liked and find demand for my skills.

I got my MS in 1979 and PhD in 1984 in Computer Science from New York University.
I was interested in AI (perhaps thanks to a lot of science fiction I read as a kid), but found a job in databases, so I was looking for ways to combine them.

In 1984 I joined GTE Labs where I worked on research in databases and AI, and in 1989 started the first project on Knowledge Discovery in data. To help convince my management that there will be a demand for this thing
called “data mining” (GTE management did not see much future for it), I also organized a AAAI workshop on the topic.

I thought “data mining” is not sexy enough name, and so I called it “Knowledge Discovery in Data”, or KDD.  Since 1989, I was working on KDD and data mining in all aspects – more on my page www.kdnuggets.com/gps.html

Ajay-  How would you encourage a young science entrepreneur in this recession.

Gregory- Many great companies were started or grew in a recession, e.g.
http://www.insidecrm.com/features/businesses-started-slump-111108/

Recession may be compared to a brush fire which removes dead wood and allows new trees to grow.

Ajay- What prompted you to set up KD Nuggets? Any reasons for the name (kNowledge Discovery Nuggets). Describe some key milestones in this iconic website for data mining people.

Gregory- After a third KDD workshop in 1993 I started a newsletter to connect about 50 people who attended the workshop and possibly others who were interested in data mining and KDD.  The idea was that it will have short items or “nuggets” of information. Also, at that time a popular metaphor for data miner was gold miners who were looking for gold “nuggets”.  So, I wanted a newsletter with “nuggets” – short, valuable items about Knowledge Discovery.  Thus, the name KDnuggets.

In 1994 I created a website on data mining at GTE and in 1997, after I left  GTE , I moved it to the current domain name www.kdnuggets.com .

In 1999, I was working for startup which provided data mining services to financial industry.  However, because of Y2K issues, all banks etc froze their systems in the second half of 1999, and we had very little work (and our salaries were reduced as well).  I decided that I will try to get some ads and was able to get companies like SPSS and Megaputer to advertise.

Since 2001, I am an independent consultant and KDnuggets is only part of what I am doing.  I also do data mining consulting, and actively participate in SIGKDD (Director 1998-2005, Chair 2005-2009).

Some people think that KDnuggets is a large company, with publisher, webmaster, editor, ad salesperson, billing dept, etc.  KDnuggets indeed has all this functions, but it is all me and my two cats.

Ajay- I am impressed by the fact KD nuggets is almost a dictionary or encyclopedia for data mining. But apart from advertising you have not been totally commercial- many features of your newsletter remain ad free – you still maintain a minimalistic look and do not take sponsership aligned with one big vendor. What is your vision for KD Nuggets for the years to come to keep it truly independent.

Gregory- My vision for KDnuggets is to be a comprehensive resource for data mining community, and I really enjoyed maintaining such resource for the first 7-8 years completely non-commercially. However, when I became self -employed, I could not do KDnuggets without any income, so I selectively introduced ads, and only those which are relevant to data mining.

I like to think of KDnuggets as a Craiglist for data mining community.

I certainly realize the importance of social media and Web 2.0 (and interested people can follow my tweets at tweeter.com/kdnuggets)  and plan to add more social features to KDnuggets.

Still, just like Wikipedia and Facebook do not make New York Times obsolete, I think there is room and need for an edited website, especially for such a nerdy and not very social group like data miners.

Ajay- What is the worst mistake/error in writing publishing that you did. What is the biggest triumph or high moment in the Nuggets history.

Gregory- My biggest mistake is probably in choosing the name kdnuggets – in retrospect,  I could have used a shorter and easier to spell domain name, but in 1997 I never expected that I will still be publishing www.KDnuggets.com 12 years later.

Ajay- Who are your favourite data mining students ( having known so many people). What qualities do you think set a data mining person apart from other sceinces.

Gregory- I was only an adjunct professor for a short time, so I did not really have data mining students, but I was privileged enough to know many current data mining leaders when they were students.  Among more recent students, I am very impressed with Jure Leskovec, who just finished his PhD and got the best KDD dissertation award.

Ajay- What does Gregory Piatetsky do for fun when he is not informing the world on analytics and knowledge discovery.

Gregory- I enjoy travelling with my family, and in the summer I like biking and windsurfing.
I also read a lot, and currently in the middle of reading Proust (which I periodically dilute by other, lighter books).

Ajay- What is your favourite reading blog and website ? Any India plans to visit.
Gregory
– I visit many blogs on www.kdnuggets.com/websites/blogs.html

and I like especially
– Matthew Hurst blog: Data Mining: Text Mining, Visualization, and Social Media
– Occam’s Razor by Avinash Kaushik, examining web analytics.
– Juice Analytics, blogging about analytics and visualization
– Geeking with Greg, exploring the future of personalized information.

I also like your website decisionstats.com and plan to visit it more frequently

I visited many countries, but not yet India – waiting for the right occasion !

Biography

(http://www.kdnuggets.com/gps.html)

Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides research and consulting services in the areas of data mining, web mining, and business analytics. Gregory is considered to be one of the founders of the data mining and knowledge discovery field.Gregory edited or co-edited many collections on data mining and knowledge discovery, including two best-selling books: Knowledge Discovery in Databases (AAAI/MIT Press, 1991) and Advances in Knowledge Discovery in Databases (AAAI/MIT Press, 1996), and has over 60 publications in the areas of data mining, artificial intelligence and database research.

Gregory is the founder of Knowledge Discovery in Database (KDD) conference series. He organized and chaired the first three Knowledge Discovery in Databases (KDD) workshops in 1989, 1991, and 1993. He then served as the Chair of KDD Steering committee and guided the conversion of KDD workshops into leading international conferences on data mining. He also was the General Chair of the KDD-98 conference.

Interview Tasso Argyros CTO Aster Data Systems

Here is an interview with Tasso Argyros,the CTO and co-founder of Aster Data Systems (www.asterdata.com ) .Aster Data Systems is one of the first DBMS to tightly integrate SQL with MapReduce.

tassos_argyros

Ajay- Maths and Science students the world over are facing a major decline. What would you recommend to young students to get careers in science.

[TA]My father is a professor of Mathematics and I spent a lot of my college time studying advanced math. What I would say to new students is that Math is not a way to get  a job, it’s a way to learn how to think. As such, a Math education can lead to success in any discipline that requires intellectual abilities. As long as they take the time to specialize at some point – via  postgraduate education or a job where they can learn a new discipline from smart people – they won’t regret the investment.

Ajay- Describe your career in Science particularly your time at Stanford. What made you think of starting up Asterdata. How important is it for a team rather than an individual to begin startups. Could you describe the startup moment when your team came together.

[TA] – While at Stanford I became very familiar with the world of startups through my advisor, David Cheriton (who was an angel investor in VMWare, Google and founder of two successful companies). My research was about processing large amounts of data on large, low-cost computer farms. A year into my research it became obvious that this approach had huge processingpower advantages and it was superior to anything else I could see in the marketplace. I then happened to meet my other two co-founders, Mayank Bawa & George Candea who were looking at a similar technical problem from the database and reliability perspective, respectively.

I distinctly remember George walking into my office one day (I barely knew him back then) and saying “I want talk to you about startups and the future” – the rest has become history.

Ajay- How would you describe your product Aster nCluster Cloud Edition to omebody who does not anything beyond the Traditional Server/ Datawarehouse technologies. Could you rate it against some known vendors and give a price point specific to what level of usage does the Total Cost of Ownership in Asterdata becomes cheaper than a say Oracle or a SAP or a Microsoft Datawarehosuing solution.

[TA]- Aster allows businesses  to reduce the data analytics TCO in two interesting ways. First, it has a much lower hardware cost than any traditional DW technology because of its use of commodity servers or cloud infrastructure like Amazon EC2. Secondly, Aster has implemented a lot of  innovations that simplify the (previously tedious and expensive) management of the system, which includes scaling the system elastically up/down as needed – so they are not paying for capacity they don’t need at a given point in time.

But cutting costs is one side of the equation; what makes me even more excited is the ability to make a business more profitable, competitive and efficient through analyzing more data at greaterdepth. We have customers that have cut their costs and increased their customers and revenue by using Aster to analyze their valuable (and usually underutilized) data. If you have data – and you think you’re not taking full advantage of it – Aster can help.

Ajay- I have always have this one favourite question.When can I analyze 100 giga bytes of data using just a browser and some statistical software like R or advanced forecasting softwares that are available.Describe some of Asterdata ‘s work in enhancing the analytical capabilities of big data.

Can I run R ( free -open source) on an on demand basis for an Asterdata solution. How much would it cost me to crunch 100 gb of data and make segmentations and models with say 50 hours of processing time per month

[TA]- One of the big innovations that Aster does it to allow analytical applications like R to be embedded in the database via our SQL/MapReduce framework. We actually have customers right now that are using R to do advanced analytics over terabytes of data.  100GB is actually on the lower end of what our software can enable and as such the cost would not be significant.

Ajay- What do people at Asterdata do when not making complex software.

[TA]- A lot of Asterites love to travel around the world – we are, after all, a very diverse company. We also love coffee, Indian food as well as international and US sports like soccer, cricket, cycling,and football!

Ajay- Name some competing products to Asterdata and where Asterdata products are more suitable for a TCO viewpoint. Name specific areas where you would not recommend your own products.

[TA]- We go against products like Orace database, Teradata and IBM DB2. If you need to do analytics over 100s of GBs or terabytes of data, our price/performance ratio would be orders of magnitude better.

Ajay- How do you convince named and experienced VC’s Sequia Capital to invest in a start-up ( eg I could do with some server costs coming financing)

[TA]- You need to convince Sequoia of three things. (a) that the market you’re going after is very large (in the billions of dollars, if you’re successful). (b) that your team is the best set of people that could ever come together to solve the particular problem you’re trying to solve. And (c) that the technology you’ve developed gives you an “unfair advantage” over incumbents or new market entrants.  Most importantly, you have to smile a lot! J

Biography

About Tasso:

Tasso (Tassos) Argyros is the CTO and co-founder of Aster Data Systems, where he is responsible for all product and engineering operations of the company. Tasso was recently recognized as one ofBusinessWeek’s Best Young Tech Entrepreneurs for 2009 and was an SAP fellow at the Stanford Computer Science department. Prior to Aster, Tasso was pursuing a Ph.D. in the Stanford Distributed Systems Group with a focus on designing cluster architectures for fast, parallel data processing using large farms of commodity servers. He holds an MsC in Computer Science from Stanford University and a Diploma in Computer and Electrical Engineering from Technical University of Athens.

About Aster:

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware.

The Aster nCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis. Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget.

Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton, Rajeev Motwani and Ron Conway.

Aster_logo_3.0_red

Interview Steve Sarsfield Author The Data Governance Imperative

Here is an interview with Steve Sarsfield, data quality evangelist and author of Data Quality Imperative.


Ajay- Describe your early career to the present point. At what point did you decide to specialize or focus on data quality and data governance? What were the causes for it?


Steve- When I was growing up, not many normal people had aspirations of becoming data management professionals. Back in those days, we had aspirations to be NFL wide receivers, writers, and engineers,and lawyers.  Data management careers tend to find you.

My career path has wandered through technical support, technical writer and managing editor, consulting,and product management for Lotus development. I’ve been working for the past nine years at a major data quality vendor – the longest job I’ve had to date. The good news is that this latest gig has given me a chance to meet with a LOT of people who have been implementing data quality and data governance projects.

When you get involved with the projects, you’ll begin to realize the power it has. You begin to love data governance for the efficiencies it brings, and for the impact it will have on your organization as it becomes more competitive.


Ajay- Some people think data quality is a boring job and data governance is an abstract philosophy. How would you interest a young high school /college student, with the right aptitude, in taking a business intelligence career and be focused on it.


Steve- In my opinion if you promote a geeky view of data governance the message will tend to fall flat. If there’s one thing I have written most about, it is about bridging the gap between technology and business.Those who succeed in this field now and in the future will be people who are a bit of a jack-of-all-trades.

You need to be a good technologist, critical thinker, marketer, and strategist, and you need to use those skills every day to succeed. Leadership skills are also important, especially if you are trying to bootstrap a data governance program at your corporation. Those job attributes are not boring, they are challenging and exciting.

In terms of being persuasive about getting involved in a data career, it’s clear that data is not likely to decrease in volume in the coming years, quite the contrary, so your job will have a reasonable amount of security.  Nor will there be less of a need in the future for developing accurate business metrics from the data.

In my book, I talk about the fact that the decision of a corporation to move toward data governance is really a choice between optimism and fear. Your company must decide to either be haunted by a never-ending vision that there will only will be more data, more mergers and more complexity in the years to come, orthey will decide to take charge for a more hopeful future that will bring more opportunity, more efficiency and a more agile working environment. When you choose data governance as a career, you choose to provide that optimism for your employer.


Ajay-What are the salient points in your book Data Governance Imperative. Do you think data governance is an idea whose time has come.


Steve-The book is about the increasing importance of data to a business. As your company collects more and more data about customers, products, suppliers, transactions and billing, it becomes more difficult to accurately maintain that information without a centralized approach and a team devoted to the data management mission.

The book comes from discussions with folks in the business who are trying to get a data governance program started in their corporation.  They are the data champions who “get it”, but are yet to convince their management that data is crucial to the success of the company.

The fact is, there are metrics you can follow, processes that you can put in place, conversations that you can have, and technology that you can implement in order to make your managers and co-workers see the importance of data governance.  We know this because it has worked for so many companies who are far more advanced in managing their data than most.

The most evolved companies will have support from executive management and the entire company to define reusable processes for data governance and a center of excellence is formed around it. Much of the book is about garnering support and setting up the processes to prove enterprise data’s importance.  Only when you do that will your company evolve its data governance strategy.


Ajay- Garbage Data In and Garbage Data Analysis Out. What percentage of a BI installation budget goes to input data quality at data entry center. What is the kind of budget you would like it to be.


Steve- I’m sure this varies depending upon many factors, including the number of sources, age and quality of the source data, etc. Anecdotally, the percentage of budget five years ago was near zero. You really only saw realization of the problem LATE in the project, after the first data warehouse loading occurred. What has happened over the years is that we’ve gotten a lot smarter about this, perhaps as a result of our past failures. In the past, if the data worked well in the source systems it was assumed that it would work in the target.

A lot of those projects failed because the team incorrectly scoped the project with regard to the data integration. Today we have the wisdom and experience to know that this is not true.  In order to really assess our needs for data quality, we know we need to profile the data as one of the first tasks in the process.  This will help us create a more accurate timeline and budget and ensure management that weknow what we’re doing with regard to data integration and business intelligence.


Ajay- Do you think Federal Governments can focus stimulus spending smarter with better input data quality?


Steve- Believe it or not, I’m encouraged by the US Government’s plan on data quality. To varying degrees,Presidents Clinton, Bush and Obama have all supported plans for greater transparency and openness. To accomplish that, you have to govern data. In Washington, many government agencies now have a Chief Information Officer. The government is recruiting leading universities like MIT to work toward better data governance in government.  The sheer number of databases even within a single US government agencywill be a huge challenge, but the direction is good.

This year’s MIT Information Quality Symposium, for example, had a very solid government track with speakers from the Army, Air Force, Department of Defense, EPA, HUD, and National Institute of Health to name just a few.

Other than the US, it gets even cloudier.  There are governments ahead of the US, like UK and Germany, and those who still need to catch up.


Ajay- Name some actual anecdotes in which 1) bad data quality led to disaster 2) good data quality gave great insights


Steve- There are certainly plenty of typical examples I always like the unusual examples, like:

A major motorcycle manufacturer used data quality tools to pull out nicknames from their customer records. Many of the names they had acquired for their prospect list were from motorcycle events and contests where the entries were, shall we say, colorful. The name fields contained data like “John the Mad Dog Smith” or “Frank Motor-head Jones”. The client used the tool to separate the name from the nickname, making it a more valuable marketing list.

One major utility company used data quality tools to identify and record notations on meter-reader records that were important to keep for operational uses, but not in the customer billing record. Upon analysis of the data, the company noticed random text like “LDIY” and “MOR” along with the customer records. After somework with the business users, they figured out that LDIY meant “Large Dog in Yard” which was particularly important for meter readers. MOR meant “Meter in Right, which was also valuable. The readers were given their own notes field, so that they could maintain the integrity of the name and address while also keeping this valuable data. IT probably saved a lot of meter readers from dog bite situations.

Financial organizations have used data quality tools to separate items like “John and Judy Smith/221453789 ITF George Smith”. The organization wanted to consider this type of record as three separate records “John Smith” and “Judy Smith” and “George Smith” with obvious linkage between the individuals. This type of data is actually quite common on mainframe migrations.

A food manufacturer standardizes and cleanses ingredient names to get better control of manufacturing costs. In data from their worldwide manufacturing plants, an ingredient might be “carrots” “chopped frozen carrots” “frozen carrots, chopped” “chopped carrots, frozen” and so on. (Not to mention all the possible abbreviations for the words carrots, chopped and frozen.) Without standardization of these ingredients, there was really no way to tell how many carrots the company purchased worldwide.

There was no bargaining leverage with the carrot supplier, and all the other ingredient suppliers, until the data was fixed.In terms of disasters, I’d recommend the IAIDQ’s web site – IQ Trainwrecks.http://www.iqtrainwrecks.com/ The IAIDQ does a great job and I contribute when I can.


Ajay- What are the essential 5 things a CEO should ask his CTO to ensure good data quality in an enterprise.


Steve- What a great question. I can think of more than five, but let’s start with:


1) What is poor quality data costing us?
This should inspire your CTO to go out and seek problem areas in partnership with the business and ways to improve processes.

2) Do I have to make decisions on gut-feel, or should I trust the business intelligence you give our employees?  What confidence level do you have in our BI?

The CEO should be confident in the metrics delivered with BI and he should make sure the CTO has the same concerns.

3) Are we in compliance with all laws regarding our governance of data?

CEOs are often culpable for non-compliance, so he/she should be concerned about any laws that govern the company’s industry. Even in unregulated industries, organizations must comply with spam laws and “do not mail” laws for marketing.

4) Are you working across business units to work towards data governance, or is data quality done in silos?

When possible data quality should be a process that is reusable and able to be implemented in similar manner across business units.

5) Do you have the access to data you need?

The CEO should understand if any office politics are getting in the way of ensuring data quality and this question opens the door to that discussion.

Ajay- What does Steve Sarsfield do when not writing blogs and books.


Steve-These days, when I’m not thinking about data or my blog, I’m thinking about my fantasy football team and the upcoming season. I’ve got a ticket to the New England Patriots opening game vs the Buffalo Bills and I’m looking forward to it. On the weekends, you may find me playing a game of mafia wars on Facebook or cooking up a big pot of chili for the family.


Biography-


Steve Sarsfield is a Data governance business expert, speaker, author of The Data Governance Initiative ( at http://www.itgovernance.co.uk/products/2446 ) and blogger at http://data-governance.blogspot.com/. Product marketing professional  at a major data quality vendor and author of the book “The Data Governance Imperative”.He was Guest speaker at MIT Information Quality Symposium (July 2007 and July 2008),  at the International Association for Information and Data Quality (IAIDQ) Symposium (December 2006) and at SAP CRM 2006 summit.