Interview John Sall Founder JMP/SAS Institute

Here is an interview with John Sall, inventor of SAS and JMP and co-founder and co-owner of SAS Institute, the largest independent business intelligence and analytics software firm. In a free wheeling and exclusive interview, John talks of the long journey within SAS and his experiences in helping make JMP the data visualization software of choice.
JMP is perfect for anyone who wants to do exploratory data analysis and modeling in a visual and interactive way – John Sall

untitled2

Ajay- Describe your early science career. How would you encourage today’s generation to take up science and math careers?

John- I was a history major in college, but I graduated into a weak job market. So I went to graduate school and discovered statistics and computer science to be very captivating. Of course, I grew up in the moon-race science generation and was always a science enthusiast.

Ajay- Archimedes leapt out the bath shouting “Eureka” when he discovered his principle. Could you describe a “Eureka” moment while creating the SAS language when you and Jim Goodnight were working on it?

John- I think that the moments of discovery were more like “Oh, we were idiots” as we kept having to rewrite much of the product to handle emerging environments, like CMS, minicomputers, bitmap workstations, personal computers, Windows, client-server, and now the cloud. Several of the rewrites were even changing the language we implemented it in. But making the commitment to evolve led to an amazing sequence of growth that is still going on after 35 years.

Ajay- Describe the origins of JMP. What specific market segments does the latest release of JMP target?

John- JMP emerged from a recognition of two things: size and GUI. SAS’ enterprise footprint was too big a commitment for some potential users, and we needed a product to really take advantage of graphical interactivity. It was a little later that JMP started being dedicated more to the needs of engineering and science users, who are most of our current customers.

Ajay- What other non-SAS Institute software do you admire or have you worked with? Which areas is JMP best suited for? For which areas would you recommend software other than JMP to customers?

John- My favorite software was the Metrowerks CodeWarrior development environment. Sadly, it was abandoned among various Macintosh transitions, and now we are stuck with the open-source GCC and Xcode. It’s free, but it’s not as good.

JMP is perfect for anyone who wants to do exploratory data analysis and modeling in a visual and interactive way. This is something organizations of all kinds want to do. For analytics beyond what JMP can do, I recommend SAS, which has unparalleled breadth, depth and power in its analytic methods.

Ajay- I have yet to hear of a big academic push for JMP distribution in Asia. Are there any plans to distribute JMP for free or at very discounted prices in academic institutions in countries like India, China or even the rest of the USA?

John- We are increasing our investment in supporting academic institutions, but it has not been an area of strength for us. Professors seem to want the package they learned long ago, the language that is free or the spreadsheet program their business students already have. JMP’s customers do tell us that they wish the universities would train their prospective future employees in JMP, but the universities haven’t been hearing them. Fortunately, JMP is easy enough to pick up after you enter the work world. JMP does substantially discount prices for academic users.

Ajay- What are your views on tech offshoring, given the recession in the United States?

John- As you know, our products are mostly made in the USA, but we do have growing R&D operations in Pune and Beijing that have been performing very well. Even when the software is authored in the US, considerable work happens in each country to localize, customize and support our local users, and this will only increase as we become more service-oriented. In this recession, JMP has still been growing steadily.

Ajay-  What advice would you give to young graduates in this recession? How does learning JMP enhance their prospect of getting a job?

John- Quantitative fields have been fairly resistant to the recession. North Carolina State University, near the SAS campus, even has a Master of Science in Analytics < http://analytics.ncsu.edu/ > to get people job-ready. JMP experience certainly helps get jobs at our major customers.

Ajay- What does John Sall do in his free time, when not creating world-class companies or groovy statistical discovery software?

John- I lead the JMP division, which has been a fairly small part of a large software company (SAS), but JMP is becoming bigger than the whole company was when JMP was started. In my spare time, I go to meetings and travel with the Nature Conservancy <http://www.nature.org/ >, North Carolina State University <http:// http://ncsu.edu/ >, WWF <http://wwf.org/ >, CARE <http://www.care.org/ > and several other nonprofit organizations that my wife or I work with.

Official Biography

John Sall is a co-founder and Executive Vice President of SAS, the world’s largest privately held software company. He also leads the JMP business division, which creates interactive and highly visual data analysis software for the desktop.

Sall joined Jim Goodnight and two others in 1976 to establish SAS. He designed, developed and documented many of the earliest analytical procedures for Base SAS® software and was the initial author of SAS/ETS® software and SAS/IML®. He also led the R&D effort that produced SAS/OR®, SAS/QC® and Version 6 of Base SAS.

Sall was elected a Fellow of the American Statistical Association in 1998 and has held several positions in the association’s Statistical Computing section. He serves on the board of The Nature Conservancy, reflecting his strong interest in international conservation and environmental issues. He also is a member of the North Carolina State University (NCSU) Board of Trustees. In 1997, Sall and his wife, Ginger, contributed to the founding of Cary Academy, an independent college preparatory day school for students in grades 6 through 12.

Sall received a bachelor’s degree in history from Beloit College in Beloit, WI, and a master’s degree in economics from Northern Illinois University in DeKalb, IL. He studied graduate-level statistics at NCSU, which awarded him an honorary doctorate in 2003.

About JMP-

Originally nicknamed as John’s Macintosh Program, JMP is a leading software program in data visualization for statistical software. Researchers and engineers – whose jobs didn’t revolve solely around statistical analysis – needed an easy-to-use and affordable stats program. A new software product, today known as JMP®, was launched in 1989 to dynamically link statistical analysis with the graphical capabilities of Macintosh computers. Now running on all platforms, JMP continues to play an important role in modeling processes across industries as a desktop data visualization tool. It also provides a visual interface to SAS in an expanding line of solutions that includes SAS Visual BI and SAS Visual Data Discovery. Sall remains the lead architect for JMP.

Citation- http://www.sas.com/presscenter/bios/jsall.html

Ajay- I am thankful to John and his marketing communication specialist Arati for this interview.With an increasing focus on data to drive more rational decision making, SAS remains an interesting company to watch for in the era of mega- vendors and any SAS Institute deal and alliance will be  making potential investment bankers as well as newer customers drool. For previous interviews and coverage of SAS please use www.decisionstats.com/tag/sas

Interview Jim Harris Data Quality Expert OCDQ Blog

Here is an interview with one of the chief evangelists to data quality in the field of Business Intelligence, Jim Harris who has a renowned blog at http://www.ocdqblog.com/. I asked Jim about his experiences in the field on data quality messing up big budget BI projects, and some tips and methodologies to avoid them.

No one likes to feel blamed for causing or failing to fix the data quality problems- Jim Harris, Data Quality Expert.

Jim Harris Large Photo

Ajay- Why the name OCDQ? What drives your passion for data quality? Name any anecdotes where bad data quality really messed up a big BI project.

Jim Harris – Ever since I was a child, I have had an obsessive-compulsive personality. If you asked my professional colleagues to describe my work ethic, many would immediately respond: “Jim is obsessive-compulsive about data quality…but in a good way!” Therefore, when evaluating the short list of what to name my blog, it was not surprising to anyone that Obsessive-Compulsive Data Quality (OCDQ) was what I chose.

On a project for a financial services company, a critical data source was applications received by mail or phone for a variety of insurance products. These applications were manually entered by data entry clerks. Social security number was a required field and the data entry application had been designed to only allow valid values. Therefore, no one was concerned about the data quality of this field – it had to be populated and only valid values were accepted.

When a report was generated to estimate how many customers were interested in multiple insurance products by looking at the count of applications per social security number, it appeared as if a small number of customers were interested in not only every insurance product the company offered, but also thousands of policies within the same product type. More confusion was introduced when the report added the customer name field, which showed that this small number of highly interested customers had hundreds of different names. The problem was finally traced back to data entry.

Many insurance applications were received without a social security number. The data entry clerks were compensated, in part, based on the number of applications they entered per hour. In order to process the incomplete applications, the data entry clerks entered their own social security number.

On a project for a telecommunications company, multiple data sources were being consolidated into a new billing system. Concerns about postal address quality required the use of validation software to cleanse the billing address. No one was concerned about the telephone number field – after all, how could a telecommunications company have a data quality problem with telephone number?

However, when reports were run against the new billing system, a high percentage of records had a missing telephone number. The problem was that many of the data sources originated from legacy systems that only recently added a telephone number field. Previously, the telephone number was entered into the last line of the billing address.

New records entered into these legacy systems did start using the telephone number field, but the older records already in the system were not updated. During the consolidation process, the telephone number field was mapped directly from source to target and the postal validation software deleted the telephone number from the cleansed billing address.

Ajay- Data Quality – Garbage in, Garbage out for a project. What percentage of a BI project do you think gets allocated to input data quality? What percentage of final output is affected by the normalized errors?

Jim Harris- I know that Gartner has reported that 25% of critical data within large businesses is somehow inaccurate or incomplete and that 50% of implementations fail due to a lack of attention to data quality issues.

The most common reason is that people doubt that data quality problems could be prevalent in their systems. This “data denial” is not necessarily a matter of blissful ignorance, but is often a natural self-defense mechanism from the data owners on the business side and/or the application owners on the technical side.

No one likes to feel blamed for causing or failing to fix the data quality problems.

All projects should allocate time and resources for performing a data quality assessment, which provides a much needed reality check for the perceptions and assumptions about the quality of the data. A data quality assessment can help with many tasks including verifying metadata, preparing meaningful questions for subject matter experts, understanding how data is being used, and most importantly – evaluating the ROI of data quality improvements. Building data quality monitoring functionality into the applications that support business processes provides the ability to measure the effect that poor data quality can have on decision-critical information.

Ajay- Companies talk of paradigms like Kaizen, Six Sigma and LEAN for eliminating waste and defects. What technique would you recommend for a company just about to start a major BI project for a standard ETL and reporting project to keep data aligned and clean?

Jim Harris- I am a big advocate for methodology and best practices and the paradigms you mentioned do provide excellent frameworks that can be helpful. However, I freely admit that I have never been formally trained or certified in any of them. I have worked on projects where they have been attempted and have seen varying degrees of success in their implementation. Six Sigma is the one that I am most familiar with, especially the DMAIC framework.

However, a general problem that I have with most frameworks is their tendency to adopt a one-size-fits-all strategy, which I believe is an approach that is doomed to fail. Any implemented framework must be customized to adapt to an organization’s unique culture. In part, this is necessary because implementing changes of any kind will be met with initial resistance, but an attempt at forcing a one-size-fits-all approach almost sends a message to the organization that everything they are currently doing is wrong, which will of course only increase the resistance to change.

Starting with a framework as a reference provides best practices and recommended options of what has worked for other organizations. The framework should be reviewed to determine what can best be learned from it and to select what will work in the current environment and what simply won’t. This doesn’t mean that the selected components of the framework will be implemented simultaneously. All change comes gradually and the selected components will most likely be implemented in phases.

Fundamentally, all change starts with changing people’s minds. And to do that effectively, the starting point has to be improving communication and encouraging open dialogue. This means more of listening to what people throughout the organization have to say and less of just telling them what to do. Keeping data aligned and clean requires getting people aligned and communicating.

Ajay- What methods and habits would you recommend to young analysts starting in the BI field for a quality checklist?

Jim Harris- I always make two recommendations.

First, never make assumptions about the data. I don’t care how well the business requirements document is written or how pretty the data model looks or how narrowly your particular role on the project has been defined. There is simply no substitute for looking at the data.

Second, don’t be afraid to ask questions or admit when you don’t know the answers. The only difference between a young analyst just starting out and an expert is that the expert has already made and learned from all the mistakes caused by being afraid to ask questions or admitting when you don’t know the answers.

Ajay- What does Jim Harris do to have quality time when not at work?

Jim- Since I enjoy what I do for a living so much, it sometimes seems impossible to disengage from work and make quality time for myself. I have also become hopelessly addicted to social media and spend far too much time on Twitter and Facebook. I have also always spent too much of my free time watching television and movies. I do try to read as much as I can, but I have so many stacks of unread books in my house that I could probably open my own book store. True quality time typically requires the elimination of all technology by going walking, hiking or mountain biking. I do bring my mobile phone in case of emergencies, but I turn it off before I leave.

Biography-

Jim Harris Small PhotoJim Harris is the Blogger-in-Chief here at Obsessive-Compulsive Data Quality (OCDQ), which is an independent blog offering a vendor-neutral perspective on data quality.

He is an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality (DQ), and business intelligence (BI),

Jim has worked with Global 500 companies in finance, brokerage, banking, insurance, healthcare, pharmaceuticals, manufacturing, retail, telecommunications, and utilities. Jim also has a long history with the product that is now known as IBM InfoSphere QualityStage. Additionally, he has some experience with Informatica Data Quality and DataFlux dfPower Studio.

Jim can be followed at twitter.com/ocdqblog and contacted at http://www.ocdqblog.com/contact/


Interview Eric Siegel, Phd President Prediction Impact

An interview with Eric Siegel, Ph.D.President of Prediction Impact, Inc. and founding chair of Predictive Analytics World.

Ajay- What does this round of Predictive Analytics World have —–which was not there in the edition earlier in the year.

Eric- Predictive Analytics World (pawcon.com) – Oct 20-21 in DC delivers a fresh set of 25 vendor-neutral presentations across verticals employing predictive analytics, such as banking, financial services, e-commerce, education, healthcare, high technology, insurance, non-profits, publishing, retail and telecommunications.

PAW features keynote speaker, Stephen Baker, author of The Numerati and Senior writer at BusinessWeek.  His keynote is described at www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-2

A strong representation of leading enterprises have signed up to tell their stories — speakers will present how predictive analytics is applied at Aflac, AT&T Bell South, Amway, The Coca-Cola Company, Financial Times, Hewlett-Packard, IRS, National Center for Dropout Prevention, The National Rifle Association, The New York Times, Optus (Australian telecom), PREMIER Bankcard, Reed Elsevier, Sprint-Nextel, Sunrise Communications (Switzerland), Target, US Bank, U.S. Department of Defense, Zurich — plus special examples from Anheuser-Busch, Disney, HSBC, Pfizer, Social Security Administration, WestWind Foundation and others.

To see the entire agenda at a glance: www.predictiveanalyticsworld.com/dc/2009/agenda_overview.php

We’ve added a third workshop, offered the day before (Oct 19), “Hands-On Predictive Analytics.  There’s no better way to dive in than operating real predictive modeling software yourself – hands-on.”  For more info: www.predictiveanalyticsworld.com/dc/2009/handson_predictive_analytics.php

Ajay- What do academics, corporations and data miners gain in this conference? list 4 bullet points for the specific gains.

Eric- A. First, PAW’s experienced speakers provide the “how to” of predictive analytics. PAW is a unique conference in its focus on the commercial deployment of predictive analytics, rather than research and development. The core analytical technology is established and proven, valuable as-is without additional R&D — but that doesn’t mean it’s a “cakewalk” to employ it successfully to ensure value is attained.  Challenges include data requirements and preparation, integration of predictive models and their scores into existing organizational systems and processes, tracking and evaluating performance, etc. There’s no better way to hone your skills and cultivate an informed plan for your organization’s efforts than hearing how other organizations did it.

B. Second, PAW covers the latest state-of-the-art methods produced by research labs, and how they provide value in commercial deployment. This October, almost all sessions in Track 2 are at the Expert/Practitioner-level.  Advanced topics include ensemble models, uplift modeling (incremental modeling), model scoring with cloud computing, predictive text analytics, social network analysis, and more.

PAW’s pre- and post-conference workshops round out the learning opportunities. In addition to the hands-on workshop mentioned above, there is a course covering core methods, “The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes” (www.predictiveanalyticsworld.com/dc/2009/predictive_modeling_methods.php) and a business-level seminar on decision automation and support, “Putting Predictive Analytics to Work” (www.predictiveanalyticsworld.com/dc/2009/predictive_analytics_work.php).

C. Third, the leading predictive analytics software vendors and consulting firms are present at PAW as sponsors and exhibitors, available to provide demos and answer your questions.  What do the predictive analytics solutions do, how do they compare, and which is best for your? PAW is the one-stop-shop for selecting the tool or solution most suited to address your needs.

D. Fourth, PAW provides a unique, focused opportunity to network with colleagues and establish valuable contacts in the predictive analytics industry.  Mingle, connect and hang out with professionals facing similar challenges (coffee breaks, meals, and at the reception).

Ajay- How do you balance the interests of various competing softwares and companies who sponsor such event?

Eric- As a vendor-neutral event, PAW’s core program of 25 sessions is booked exclusively with enterprise practitioners, thought leaders and adopters, with no predictive analytics software vendors speaking or co-presenting. These sessions provide substantive content with take-aways which provide value that’s independent of any particular software solution — no product pitches!  Beyond these 25 sessions are three short sponsor sessions that are demarcated as such, and, despite being branded, generally prove to be quite substantive as well.  In this way, PAW delivers a high quality, unbiased program.

Supplementing this vendor-neutral program, the room right next door has an expo where attendees have all the access to software and solution vendors they could want (cf. in my answer to the prior question regarding software vendors, above).

Here are a couple more PAW links:

For informative PAW event updates:
www.predictiveanalyticsworld.com/notifications.php

To sign up for the PAW group on LinkedIn, see:
www.linkedin.com/e/gis/1005097

Ajay- Describe your career in science including research that you specialize in. How would you motivate students today to go for science careers

Eric- Well, first off, my work as a predictive analytics consultant, instructor and conference chair is in the application of established technology, rather than the research and development of new or improved methods.

But the Ph.D. next to my name reveals my secret past as an “academic”. Pure research is something I really enjoyed and I kind of feel like I had a brain transplant in order to change to “real world work”. I’m glad I made the change, although I see good sides to both types of work (really, they’re like two entirely different lifestyles).

In my research I focused on core predictive modeling methods. The ability for a computer to automatically learn from experience (data really is recorded experience, after all), is the best thing since sliced bread. Ever since I realized, as a kid, that space travel would in fact be a huge pain in the neck, nothing in science has ever seemed nearly as exciting.

Predictive analytics is an endeavor in machine learning. A predictive model is the encoding of a set of rules or patterns or regularities at some level. The model is the thing output by automated, number-crunchin’ analysis and, therefore, is the thing “learned” from the “experience” (data).  The “magic” here is the ability of these methods to find a model that performs not only over the historical data on your disk drive, but that will perform equally well for tomorrow’s new situations. That ability to generalize from the data at hand means the system has actually learned something.

And indeed the ability to learn and apply what’s been learned turns out to provide plenty of business value, as I imagined back in the lab.  The output of a predictive model is a predictive score for each individual customer or prospect.  The score in turn directly informs the business decision to be taken with that individual customer (to contact or not to contact; with which offer to contact, etc.) – business intelligence just doesn’t get more actionable than that.

For the impending student, I’d first point out the difference between applied science and research science. Research science is fun in that you have the luxury of abstraction and are usually fairly removed from the need to prove near-term industrial applicability. Applied science is fun for the opposite reason: The tangle of challenges, although less abstract and in that sense more mundane, are the only thing between you and getting the great ideas of the world to actually work, come to fruition, and have an irrefutable impact.

Ajay- What are the top five conferences in analytics and data mining in your opinion in the world including PAW.

Eric- KDD – The leading event for research and development of the core methods behind the commercial deployments covered at PAW (“Knowledge Discovery and Data Mining”).

ICML – Another long-standing research conference on machine learning (core data mining).

eMetrics.org – For online marketing optimization and web analytics

Text Analytics Summit – Text mining can leverage “unstructured data” (text) to augment predictive analytics; the chair of this conference is speaking at PAW on just that topic: www.predictiveanalyticsworld.com/dc/2009/agenda.php#day2-15

Predictive Analytics World, the business-focused event for predictive analytics professionals, managers and commercial practitioners – focused on the commercial deployment of predictive analytics: pawcon.com

Ajay- Would PAW 2009 have video archives, videos as well or podcasts for people not able to attend on site.

Eric- While the PAW conferences emphasize in-person participation, we are in the planning stages for future webcasts and other online content. PAW’s “Predictive Analytics Guide” has a growing list of online resources: www.predictiveanalyticsworld.com/predictive_analytics.php

Ajay- How do you think social media marketing can help in these conferences.

Eric- Like most events, PAW leverages social media to spread the word.

But perhaps most pertinent is the other way around: predictive analytics can help social media by increasing relevancy, dynamically selecting the content to which each reader or viewer is most likely to respond.

Ajay- Do you have any plans to take PAW more international? Any plans for a PAW journal for trainings and papers.

Eric- We’re in discussions on these topics, but for now I can only say, stay tuned!

Biographyy

The president of Prediction Impact, Inc., Eric Siegel is an expert in predictive analytics and data mining and a former computer science professor at Columbia University, where he won awards for teaching, including graduate-level courses in machine learning and intelligent systems – the academic terms for predictive analytics.He has published 13 papers in data mining research and computer science education, has served on 10 conference program committees, has chaired a AAAI Symposium held at MIT, and is the founding chair of Predictive Analytics World.

For more on Predictive Analytic World-

Predictive Analytics World Conference
October 20-21, 2009, Washington, DC
www.predictiveanalyticsworld.com
LinkedIn Group: www.linkedin.com/e/gis/1005097

Interview Gary D. Miner Author and Professor

Here is an interview with Gary Miner, Phd who has been in the data mining business for almost 30 years and a pioneer in healthcare studies pertaining to Alzheimer’s diseases. He is also co author of “the Handbook of Statistical Analysis and  Data Mining Applications”. Gary writes on how he has seen data mining change over the years, health care applications as well as his book and quotes from his experience.

GaryMinersmall

Ajay- Describe your career in science starting from college till today. How would you interest young students in science careers today in the mid of the recession

Gary – I knew that I wanted to be in “Science” even before college days, taking all the science and math courses I could in high school. This continued in undergraduate college years at a private college [Hamline University, St. Paul, Minnesota……..older than the State of Minnesota, founded in 1854, and had the first Medical School, later “sold” to the University of Minnesota] as a Biology and Chemistry major, with a minor in education. From there is did a M.S. conducting a “Physiological genetics research project”, and then a Ph.D. at another institution where I worked on Genetic Polymorphisms of Mouse blood enzymes. So through all of this, I had to use statistics to analyze the data. My M.S. was analyzed before the time of even “electronic calculators”, so I used, if you can believe this, a “hand cranked calculator”, rented, one summer to analyze my M.S. dataset. By the time my Ph.D. thesis data was being analyzed, electronic calculators were available, but the big main-frame computers were on college campuses, so I punched the data into CARDS, walked down the hill to the computing center, dropped off the stack of cards, to come back the next day to get “reams of output” on large paper [about 15” by 18”, folded in a stack, if anyone remembers those days …]. I then spent about 30 years doing medical research in academic environments with the emphasis on genetics, biochemistry, and proteonomics in the areas of mental illness and Alzheimer’s Disease, which became my main area of study, publishing the first book in 1989 on the GENETICS OF ALZHEIMER’S DISEASE.

Today, in my “semi-retirement careers”, one side-line outreach is working with medical residents on their research projects, which I’ve been doing for about 7 or 8 years now. This involves design of the research project, data collection, and most importantly “effective and accurate” analysis of the datasets. I find this a way I can reach out to the younger generation to interest them not only in “science”, but in doing “science correctly”. As you probably know, we are in the arena of the “Duming of America”; anti-science, if you wish. I’ve seen this happening for at least 30 years, during the 1980’s, 1990’s, and continuing into this Century. Even the medical residents I get to work with each year have been going “downhill” yearly in their ability to “problem solve”. I believe this is an effect of this “dumning of America”.

There are several books coming out on this Dumning of America this summer; one the first week of June, another on July 12, and another in September [see the attached PPT for slides with the covers of these 3 books}. It is a real problem, as Americans over the past few decades have moved towards “wanting simple answers”, and most things in the “real world”, e.g. reality are not simple………..that’s where Science comes in.

A recent 2008 study done by the School of Public Health at Ohio University showed that up to 88% of the published scientific papers in a top respected cancer journal either used statistics INCORRECTLY, and/or the CONCLUSION was INCORRECT. When I and my wife both did Post-Docs in Psychiatric Epidemiology in 1980-82, basically doing an MPH, the first words out of the mouth of the “Biostats – Epidemiology” professor in the first lecture to the incoming MPH students was “We might as well through out most of the medical research literature of the past 25 years, as it has either not been designed correctly or statistics have been used incorrectly”!!! ……That caught my attention. And following medical research [and medicine in general] I can tell you that “not much has changed in the past 25 years since then”, and thus that puts us “50 years behind in medical research” and medicine. ANALOGY: If some of our major companies, that are successfully using predictive analytics to organize and efficiently run their organizations, took on the “mode of operation” of medicine and medical research, they’d be “bankrupt” in 6 months” …. That’s what I tell my students.

Ajay- Describe some of the exciting things data mining can do to lower health care costs and provide more people with coverage.

Gary- As mentioned above, my personal feeling is that “medicine / health care” is 50 years “behind the times”, compared to the efficiency needed to successfully survive in this Global Economy; corporations and organizations like Wal-Mart, INTEL, many of our Pharmaceutical Companies, have used data mining / predictive analytics to survive successfully. Wal-Mart especially: Wal-Mart has it’s own set of data miners, and were writing their own procedures in the early 1990’s ………..before most of us ever heard of data mining; that is why Wal-Mart can go into China today, and open a store in any location, and know almost to 99% accuracy 1) how many check out stand needed, 2) what products to stock, 3) where in the store to stock them, and 4) what their profit margin will be. They have done this through very accurate “Predictive Analytics” modeling.

Other “ingrained” USA corporations have NOT grabbed onto this “most accurate” technology [e.g. predictive analytics modeling], and reaping the “rewards” of impending bankruptcy and disappearance today. Examples in the news, of course, our our 3 – big automakers in Detroit. If they had engaged effective data mining / modeling in the late 1990’s they could have avoided their current problems. I see the same for many of our oldest and larges USA Insurance Companies………..they are “middle management fat”, and I’ve seen their ratings go down over the past 10 years from an A rating to even a C rating [for the company in which I have my auto insurance ? you might ask me why I stay? …. An agent who is a friend, BUT it is frustrating, and this companies “mode of operation” is completely “customer un-friendly”.], while new insurance companies have “grabbed” onto modern technology, and are rising stars.

So my influence on the younger generation is to have my students do research and DATA ANALYSIS correctly.

Ajay- Describe your book ” HANDBOOK OF STATISTICAL ANALYSIS & DATA MINING APPLICATIONS”. Who would be the target audience of this and can corporate data miners gain from it as well.

Gary- There are several target audiences: The main audience we were writing for, after our Publisher looked at what “niches” had been un-met in data mining literature, was for the professional in smaller and middle sized businesses and organizations that needed to learn about “data mining / predictive analytics” “fast”…..e.g. maybe situations where the company did not have a data anlaysis group using predictive analytics, but the CEO’s and Professionals in the company knew they needed to learn and start using predictive analytics to “stay alive”. This seemed like potentially a very large audience. The book is oriented so that one does NOT have to start at chapter 1, and read sequentially, but instead can START WITH A TUTORIAL. Working through a tutorial, I’ve found in my 40 years of being in education, is the fastest way for a person to learn something new. And this has been confirmed………..I;ve had newcomers to data mining, who have already gotten the HANDBOOK, write me and say: “I’ve gone through a bunch of tutorials, and finding that I am really learning ‘how to do this’……..I’ve ready other books on ‘theory’, but just didn’t get the ‘hang of it’ from those”. My data mining consultants at StatSoft, who travel and work in “real world” situations every day, and who wrote maybe 1/3 of the tutorials in the HANDBOOK, tell me: “A person can go through the TUTORIALS in the HANDBOOK, and know 70% of what we who are doing predictive analytics consulting every day know !!!”

But there are other audiences: Corporate data miners can find it very useful also, as a “way of thinking as a data miner” can be gained from reading the book, as was expressed by one of the Amazon.com 5-STAR reviews: “What I like about this book is that it embeds those methods in a broader context, that of the philosophy and structure of data mining, especially as the methods are used in the corporate world. To me, it was really helpful in thinking like a data miner, especially as it involves the mix of science and art.”

But we’ve had others who have told us they will use is as an extra textbook in their Business Intelligence and Data Minng courses, because of the “richness” of the tutorials. Here’s a comment on the Amazon reviews from a Head of Business School who has maybe over 100 graduate students doing data mining:

“5.0 out of 5 stars. At last, a useable data mining book”

This is one of the few, of many, data mining books that delivers what it promises. It promises many detailed examples and cases. The companion DVD has detailed cases and also has a real 90 day trial copy of Statistica. I have taught data mining for over 10 years and I know it is very difficult to find comprehensive cases that can be used for classroom examples and for students to actually mine data. The price of the book is also very reasonable expecially when you compare the quantity and quality of the material to the typical intro stat book that usually costs twice as much as this data mining book.

The book also addresses new areas of data mining that are under development. Anyone that really wants to understand what data mining is about will find this book infinitively useful.”

So, I think the HANDBOOK will see use in many college classrooms.

Ajay- A question I never get the answer to is which data mining tool is good for what and not so good for what. Could you help me out with this one? What in your opinion, among the data mining and statistical tools used by you in your 40 years in this profession would you recommend for some uses, and what would you not recommend for other uses ( eg SAS,SPSS,KXEN,Statsoft,R etc etc)

Gary- This is a question I can’t answer well; but my book co-author, Robert Nisbet, Ph.D. can. He has used most of these softwares, and in fact has written 2 reviews over the past 6 years in which most of these have been discussed. I like “cutting edge endeavors”, that has been the modus operandi of my ‘career’, so when I took this “semi-retirement postion” as a data mining consultant at StatSoft, I was introduced to DATA MINING, as we started developing STATISTICA Data Miner shortly after I arrived. So most of my experience is with STATISTICA Data Miner, which of course has always been rated NO 1 in all the reviews on data miner software done by Dr. Nisbet – I believe this is primarily due to the fact that STATISTICA was written for the PC from the beginning, thus dos not have any legacy “main frame computer” coding in its history, and secondly StatSoft has been able to move rapidly to make changes as business and government data analysis needs change, and thirdly and most importantly, STATISTICA products have very “open architecture”, “flexibility”, and “customization” with every “built together / workable together” as one package. And of course the graphical output is second to none – that is how STATISTICA originally got its reputation. So I find no need of any other software, as if I need a new algorithm, I can program it to work with the “off the shelf” STATISTICA Data Miner algorithms, and thus get anything I need with the full graphical and other outputs seamlessly available.

Ask Bob Nisbet to answer this question, as he has the background to do so.

Ajay- What are the most interesting trends to watch out for in 2009-2010 in data mining in your opinion.

Gary- Things move so rapidly in this 21st century world, that this is difficult to say. Let me answer this with “hindsight”:

In late October, 2008 I wrote the first draft of Chapter 21 for the HANDBOOK. This was the “future directions of data mining”. You can look in that chapter yourself to find the 4 main areas I decided to focus on. One was on “social networking”, and one of the new examples used was TWITTER. At that time, less than one year ago, no one knew if TWITTER was going to amount to much or not ??? big question? Well, on Jan 14 when the US-AIRWAYS A320 Airbus made an emergency landing in the Hudson River, I got an EMAIL automatic message from CNN [that I subscribe to] telling me that a “plane was down in the Hudson, watch it live” …………I click on the live video: The voice form the Helicopter overhead was saying: “We see a plane, half sunk into the water, but no people? What has happened to the people? Are they all dead?………” Well, as it turned out, the CNN Helicopters had spend nearly one hour searching the river for the plane, as had other news agencies. BUT THE “ENTIRE” WORLD ALREADY KNEW !!! … Why? A person on a ferry that was crossing the river close to the crash landing used his I-Phone, snaped a photo, uploaded it to TWIT-PIX and sent a TWITTER message, and this was re-tweeted around the world. The world knew in “seconds to minutes” to which the traditional NEWS MEDIA was 1 hour late on the scene, when ALL the PEOPLE had been rescued and were on-shore in a warm building within 45 minutes of the landing. THE TRADITIONAL NEWS MEDIA ARRIVED 15 MINUTES AFTER EVERYTHING HAD HAPPENED !!!! ………AT THIS POINT we ALL KNEW that TWITTER was a new phenomenon ……….and it started growing, with 10,00 people an hour joining at one point in last spring of this year, and who knows what the rate is today. TWITTER has become a most important part not only of “social networking” among friends, but for BUSINESS —- companies even sending out ‘Parts Availability” lists to their dealers, etc.

TWITTER affected Chapter 21…………..I immediately re-wrote Chapter 21, including this first photo of the Hudson Plane crash-landing with all the people standing on the wings. BUT, not the end of this story: By the time the book was about to go to press, TWITTER had decided that “ownership” of uploaded photos resided with the photographer, and the person who took this original US-AIRBUS – PEOPLE ON THE WINGS photo wanted $600 for us to publish it in the HANDBOOK. So, I re-wrote again [the chapter was already “set” in page proofs……….so we had to make the changes directly at the printer]………this time finding another photo uploaded to social media, but in this case the person had “checked” the box to put the photo in public domain.

So TWITTER is one that I predicted would become important, but I’d thought it would be months AFTER the HANDBOOK was released in May, not last January!!!

Other things we presented in Chapter 21 about the “future of data mining” involved “photo / image recognition”, among others. The “Image Recognition”, and more importantly “movement recognition / analysis” for things like Physical Therapy and other medical areas may be more slow to evolve and fully develop, but are immensely important. The ability to analyze such “Three-dimensional movement data” is already available in rudimentary form in our version 9 of STATISTICA [just released in June], and anyone could implement it fully with MACROS, but it probably will be some time before it is fully feasible from a business standpoint to develop it with fully automatic “point and click” functionality to make it readily accessible for anyone’s use.

Ajay What would your advice be to a young statistician just starting his research career.

Gary- Make sure you delve in / grab in FULLY to the subject areas……….you need to know BOTH the “domain” of the data you are working with, and “correct” methods of data analysis, especially when using the traditional p-value statistics. Today’s culture is too much on “superficiality”………..good data analysis requires “depth” of understanding. One needs to FOCUS ………good FOCUS can’t be done with elaborate “multi-tasking”. Granted, today’s youth [the “Technology-Inherited”] probably have their brains “wired differently” than the “Technology-Immigrants” like myself [e.g. the older generations], but never-the-less, I see ERRORS all over the place in today’s world, from “typos” in magazine and newspaper, to web page paragraphs, links that don’t work, etc etc ……….and I conclude that this is all do to NON-FOCUSED / MULTI-TASKING people. You can’t drive a car / bus / train and TEXT MESSAGE at the same time ……….the scientific tests that have been conducted show that it takes 20-times as long for a TEXT MESSAGING driver to stop, than a driver fully focused on the road, when given a “danger” warning. [Now, maybe this scientific experiment used ALL TECHNOLOGY-IMMIGRANTS as drivers?? If so, the scientific design was “flawed” ……..they should have used BOTH Technology-Immigrants and Technology-Inheritants as participants in the study. Then we’d have 2 dependent, or target variables: Age and TEXT MESSAGING…..]

Short Bio-

Professor, 30 years medical research in genetics, DNA, Proteins, Neuropsychology of Schizophrenia and Alzheimer’s Disease……….now semi-retired position as DATA MINING CONSULTANT – SENIOR STATISTICIAN

Interview John Moore CTO, Swimfish

Here is an interview with John F Moore, VP Engineering and Chief Technology Officer, Swimfish a provider of business solutions and CRM. A well known figure in Technology and CRM circles, John talks of Social CRM, Technology Offshoring, Community Initiatives and his own career.

Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

-John F Moore

JohnMoore

Ajay – Describe your career journey from college to CTO. What changes in mindset did you undergo along the journey? What advice would you give to young students to take up science careers ?

John- First, I wanted to take time to thank you for the interview offer. I graduated from Boston University in 1988 with a degree in Electrical Engineering. At the time of my graduation I found myself to be very interested in the advanced taking place on the personal computing front by companies like Lotus with their 1-2-3 product. I knew that I wanted to be involved with these efforts and landed my first job in the software space as a Software Quality Engineer working on 1-2-3 for DOS.

I spent the first few years of my career working at Lotus as a developer, a quality engineer, and manager, on products such as Lotus 1-2-3 and Lotus Notes. Throughout those early career years I learned a lot and focused on taking as many classes as possible.

From Lotus I sought out the start-up environment and by early 2000 and joined a startup named Brainshark (http://www.brainshark.com). Brainshark was, and is, focused on delivering an asynchronous communication platform on the web and was one of the early providers of SAAS. In my seven years at Brainshark I learned a lot about delivering an Enterprise class SAAS solution on top of the Microsoft technology stack. The requirements to pass security audits for Fortune 500 companies, the need to match the performance of in-house solutions, resulted in all of us learning a great deal. These were very fun times.

I now work as the VP of Engineering and CTO at Swimfish, a services and software provider of business solutions. We focus on the financial marketplace where we have the founder has a very deep background, but also work within other verticals as well. Our products are focused on the CRM, document management, and mobile product space and are built on the Microsoft technology stack. Our customers leverage both our SAAS and on-premise solutions which require us to build our products to be more flexible than is generally required for a SAAS-only solution.

The exciting thing for me is the sheer amount of opportunities I see available for science/engineering students graduating in the near future. To be prepared for these opportunities, however, it will be important to not just be technically savvy.

Engineering students should also be looking at:

* Business classes. If you want to build cool products they must deliver business value.

* Writing and speaking classes. You must be able to articulate your ideas or no one will be willing to invest in them.

I would also encourage people to take chances, get in over your head as often as possible.You may fail, you may succeed. Either way you will gain experiences that make it all worthwhile.

Ajay- How do you think social media can help with CRM. What are the basic do’s and don’ts for social media CRM in your opinion?

John- You touch upon a subject that I am very passionate about. When I think of Social CRM I think about a system of processes and products that enable businesses to actively engage with customers in a manner that delivers maximum value to all. Customers should be able to find answers to their questions with minimal friction or effort; companies should find the right customers for their products.

Social CRM should deliver on some of these fronts:

* Analyze the web of relationships that exists to define optimal pathways. These pathways will define relationships that businesses can leverage for finding their customers. These pathways will enable customers to quickly find answers to their questions. For example, I needed an answer to a question about SharePoint and project management. I asked the question on Twitter and within 3 minutes had answers from two different people. Not only did I get the answer I needed but I made two new friends who I still talk to today.

* Monitor conversations to gauge brand awareness, identify customers having problems or asking questions. This monitoring should not be stalking; however, it should be used to provide quick responses to customers to benefit the greater community.

* Usability. Too many CRM systems are not usable. They are built by engineers that think of the system as a large database and the systems often look like a database making it difficult to use by the sales, support, and marketing people.

Finally, when I think of social media I think of these properties:

* Social is about relationship building.

* You should always add more value to the community than you take in return.

* Be transparent and honest. People can tell when you’re not.

Ajay-  You are involved in some noble causes – like using blog space for out of work techies and separately for Alzheimer’s disease. How important do you think is for people especially younger people to be dedicated to community causes?

John- My mother-in-law was diagnosed with Alzheimer’s disease at the age 57. My wife and I moved into their two-family house to help her through the final years of her life. It is a horrible disease and one that it is easy to be passionate about if you have seen it in action.

My motivation on the job front is very similar. I have seen too many people suffer through these poor economic times and I simply want to do what I can to help people get back to work.

It probably sounds corny, but I firmly believe that we must all do what we can for each other. Business is competitive, but it does not mean that we cannot, or should not, help each other out. I think it’s important for everyone to have causes they believe in. You have to find your passions in life and follow them. Be a whole person and help change the world for the better.

Ajay- Describe your daily challenges as head of Engineering of Swimfish, Inc How important is it for the tech team to be integrated with the business and understand it as well.

John- The engineering team at Swimfish works very closely with the business teams. It is important for the team to understand the challenges our customers are encountering and to build products that help the customer succeed. I am not satisfied with the lack of success that many companies encounter when deploying a CRM solution.

We go as deep as possible to understand the business, the processes currently in use, the disparate systems being utilized, and then the underlying technologies currently in use. Only then do we focus on the solutions and deliver the right solution for that company.

On the product front it is the same. We work closely with customers on the features we are planning to add, trying to ensure that the solutions meet their needs as well as the needs of the other customers in the market that we are hoping to serve.

I do expect my engineers to be great at their core job, that goes without question. However, if they cannot understand the business needs they will not work for me very long.My weeks at Swimfish always provide me with interesting challenges and opportunities.

My typical day involves:

* Checking in with our support team to understand if there are any major issues being encountered by any of our customers.

* Challenging the support team to hit their targets. I love sales as without them I cannot deliver products.

* Checking in with my developers and test teams to determine how each of our projects is doing. We have a daily standup as well, but I try and personally check-in with as many people as possible.

* Most days I spend some time developing, mostly in C#. My current focus area is on our next release of our Milestone Tracking Matrix where I have made major revisions to our user interface.

I also spend time interacting on various social platforms, such as Twitter, as it is critical for me to understand the challenges that people are encountering in their businesses, to keep up with the rapid pace of technology, and just to check-in with friends. Keep it real.

Ajay-  What are your views on off shoring work especially science jobs which ultimately made science careers less attractive in the US- at the same time outsourcing companies ( in India) generally pay only 1/3 rd of billing fees to salaries. Do you think concepts like ODesk can help change the paradigm of tech out-sourcing.

John- I have mixed opinions on off-shoring. You should not offshore because of perceived cost savings only. On net you will generally break even, you will not save as much as you might originally think.

I am, however, close to starting a relationship with a good development provider in Costa Rica. The reason for this relationship is not cost based, it is knowledge based. This company has a lot of experience with the primary CRM system that we sell to customers and I have not been successful in finding this experience locally. I will save a lot of money in upfront training on this skill-set; they have done a lot of work in this area already (and have great references). There is real value to our business, and theirs.

Note that Swimfish is already working with a geographically dispersed team as part of the engineering team is in California and part is in Massachusetts. This arrangement has already helped us to better prepare for an offshore relationship and I know we will be successful when we begin.

Ajay- What does John Moore do to have fun when he is not in front of his computer or with a cause.

John- As the father of two teenage daughters I spend a lot of time going to soccer, basketball, and softball games. I also enjoy spending time running, having completed a couple of marathons, and relaxing with a good book. My next challenge will be skydiving as my 17 year old daughter and I are going skydiving when she turns 18.

Brief Bio:

For the last decade I have worked as a senior engineering manager for SAAS applications built upon the Microsoft technology stack. I have established the processes, and hired the teams that delivered hundreds of updates ranging from weekly patches to longer running full feature releases. My background as a hands-on developer combined with my strong QA background has enabled me to deliver high quality software on-time.

You can learn more about me, and my opinions, by reading my blog at http://johnfmoore.wordpress.com/ or joining me on Twitter at http://twitter.com/JohnFMoore

Interview Peter J Thomas -Award Winning BI Expert

Here is an in depth interview with Peter J Thomas, one of Europe’s top Business Intelligence expert and influential thought leaders. Peter talks about BI tools, data quality, science careers, cultural transformation and BI and the key focus areas.

I am a firm believer that the true benefits of BI are only realised when it leads to cultural transformation. -Peter James Thomas

 

Ajay- Describe about your early career including college to the present.

Peter –I was an all-rounder academically, but at the time that I was taking public exams in the 1980s, if you wanted to pursue a certain subject at University, you had to do related courses between the ages of 16 and 18. Because of this, I dropped things that I enjoyed such as English and ended up studying Mathematics, Further Mathematics, Chemistry and Physics. This was not because I disliked non-scientific subjects, but because I was marginally fonder of the scientific ones. In a way it is nice that my current blogging allows me to use language more.

The culmination of these studies was attending Imperial College in London to study for a BSc in Mathematics. Within the curriculum, I was more drawn to Pure Mathematics and Group Theory in particular, and so went on to take an MSc in these areas. This was an intercollegiate course and I took a unit at each of King’s College and Queen Mary College, but everything else was still based at Imperial. I was invited to stay on to do a PhD. It was even suggested that I might be able to do this in two years, given my MSc work, but I decided that a career in academia was not for me and so started looking at other options.

As sometimes happens a series of coincidences and a slice of luck meant that I joined a technology start-up, then called Cedardata, late in 1988; my first role was as a Trainee Analyst / Programmer. Cedardata was one of the first organisations to offer an Accounting system based on a relational database platform; something that was then rather novel, at least in the commercial arena. The RDBMS in question was Oracle version 5, running on VAX VMS – later DEC Ultrix and a wide variety of other UNIX flavours. Our input screens were written in SQL*Forms 2 – later Oracle Forms – and more complex processing logic and reports were in Pro*C; this was before PL/SQL. Obviously this environment meant that I had to become very conversant with SQL*Plus and C itself.

When I joined Cedardata, they had 10 employees, 3 customers and annual revenue of just £50,000 ($80,000). By the time I left the company eight years later, it had grown dramatically to having a staff of 250, over 300 clients in a wide range of industries and sales in excess of £12 million ($20 million). It had also successfully floatated on the main London Stock Exchange. When a company grows that quickly the same thing tends to happen to its employees.

Cedardata was probably the ideal environment for me at the time; an organisation that grew rapidly, offering new opportunities and challenges to its employees; that was fiercely meritocratic; and where narrow, but deep, technical expertise was encouraged to be rounded out by developing more general business acumen, a customer-focused attitude and people-management skills. I don’t think that I would have learnt as much, or progressed anything like as quickly in any other type of organisation.

It was also at Cedardata that I had my first experience of the class of applications that later became known as Business Intelligence tools. This was using BusinessObjects 3.0 to write reports, cross-tabs and graphs for a prospective client, the UK Foreign and Commonwealth Office (State Department). The approach must have worked as we beat Oracle Financials in a play-off to secure the multi-million pound account.

During my time at Cedardata, I rose to become an executive and filled a number of roles including Head of Development and also Assistant to the MD / Head of Product Strategy. Spending my formative years in an organisation where IT was the business and where the customer was King had a profound impact on me and has influenced my subsequent approach to IT / Business alignment.

Ajay- How would you convince young people to take maths and science more? What advice would you give to policy makers to promote more maths and science students?

Peter- While I have used little of my Mathematics directly in my commercial career, the approach to problem-solving that it inculcated in me has been invaluable. On arriving at University, it was something of a shock to be presented with Mathematical problems where you couldn’t simply look up the method of solution in a textbook and apply it to guarantee success. Even in my first year I had to grapple with challenges where you had no real clue where to start. Instead what worked, at least most of the time, was immersing yourself in the general literature, breaking down the problem into more manageable chunks, trying different techniques – sometimes quite recherché ones – to make progress, occasionally having an insight that provides a short-cut, but more often succeeding through dogged determination. All of that sounds awfully like the approach that has worked for me in a business context.

Having said that, I was not terribly business savvy as a student. I didn’t take Mathematics because I thought that it would lead to a career, I took it because I was fascinated by the subject. As I mentioned earlier, I enjoyed learning about a wide range of things, but Science seemed to relate to the most fundamental issues. Mathematics was both the framework that underpinned all of the Sciences and also offered its own world where astonishing an beautiful results could be found, independent of any applicability; although it has to be said that there are few braches of Mathematics that have not be applied somewhere or other.

I think you either have this appreciation of Science and Mathematics or you don’t and that this happens early on.

Certainly my interest was supported by my parents and a variety of teachers, but a lot of it arose from simply reading about Cosmology, or Vulcanism, or Palaeontology. I watched a YouTube of Steven Jay Gould recently saying that when he was a child in the 1950s all children were “in” to Dinosaurs, but that he actually got to make a career out of it. Maybe all children aren’t “in” to dinosaurs in the same way today, perhaps the mystery and sense of excitement has gone.

In the UK at least there appears to be less and less people taking Science and Mathematics. I am not sure what is behind this trend. I read pieces that suggest that Science and Maths are viewed as being “hard” subjects, and people opt for “easier” alternatives. I think creative writing is one of the hardest things to do, so I’m not sure where this perspective comes from.

Perhaps some things that don’t help are the twin images of the Scientist as a white-coated boffin and the Mathematician as a chalk-covered recluse, neither of whom have much of a grasp on the world beyond their narrow discipline. While of course there is a modicum of truth in these stereotypes, they are far from being wholly accurate in my experience.

Perhaps Science has fallen off of the pedestal that it was placed on in the 1950s and 1960s. Interest in Science had been spurred by a range of inventions that had improved people’s lives and often made the inventors a lot of money. Science was seen as the way to a better tomorrow, a view reinforced by such iconic developments as the discovery of the structure of DNA, our ever deepening insight about sub-atomic physics and the unravelling of many mysteries of the Universe. These advances in pure science were supported by feats of scientific / engineering achievement such as the Apollo space programme. The military importance of Science was also put into sharp relief by the Manhattan Project; something that also maybe sowed the seeds for later disenchantment and even fear of the area.

The inevitable fallibility of some Scientists and some scientific projects burst the bubble. High-profile problems included the Thalidomide tragedy and the outcry, however ill-informed, about genetically modified organisms. Also the poster child of the scientific / engineering community was laid low by the Challenger disaster. On top of this, living with the scientifically-created threat of mutually-assured destruction probably began to change the degree of positivity with which people viewed Science and Scientists. People arrived at the realisation that Science cannot address every problem; how much effort has gone into finding a cure for cancer for example?

In addition, in today’s highly technological world, the actual nuts and bolts of how things work are often both hidden and mysterious. While people could relatively easily understand how a steam engine works, how many have any idea about how their iPod functions? Technology has become invisible and almost unimportant, until it stops working.

I am a little wary of Governments fixing issues such as these, which are the result of major generational and cultural trends. Often state action can have unintended and perverse results. Society as a whole goes through cycles and maybe at some future point Science and Mathematics will again be viewed as interesting areas to study; I certainly hope so. Perhaps the current concerns about climate change will inspire a generation of young people to think more about technological ways to address this and interest them in pertinent Sciences such as Meteorology and Climatology.

Ajay-. How would you rate the various tools within the BI industry like in a SWOT analysis (briefly and individually)?

Peter- I am going to offer a Politician’s reply to this. The really important question in BI is not which tool is best, but how to make BI projects successful. While many an unsuccessful BI manager may blame the tool or its vendor, this is not where the real issues lie.

I firmly believe that successful BI rests on four mutually reinforcing pillars:

  • understand the questions the business needs to answer,
  • understand the data available,
  • transform the data to meet the business needs and
  • embed the use of BI in the organisation’s culture.

If you get these things right then you can be successful with almost any of the excellent BI tools available in the marketplace. If you get any one of them wrong, then using the paragon of BI tools is not going to offer you salvation.

I think about BI tools in the same way as I do the car market. Not so many years ago there were major differences between manufacturers.

The Japanese offered ultimate reliability, but maybe didn’t often engage the spirit.

The Germans prided themselves on engineering excellence, slanted either in the direction of performance or luxury, but were not quite as dependable as the Japanese.

The Italians offered out-and-out romance and theatre, with mechanical integrity an afterthought.

The French seemed to think that bizarrely shaped cars with wheels as thin as dinner plates were the way forward, but at least they were distinctive.

The Swedes majored on a mixture of safety and aerospace cachet, but sometimes struggled to shift their image of being boring.

The Americans were still in the middle of their love affair with the large and the rugged, at the expense of convenience and value-for-money.

Stereotypically, my fellow-countrymen majored on agricultural charm, or wooden-panelled nostalgia, but struggled with the demands of electronics.

Nowadays, the quality and reliability of cars are much closer to each other. Most manufacturers have products with similar features and performance and economy ratings. If we take financial issues to one side, differences are more likely to related to design, or how people perceive a brand. Today the quality of a Ford is not far behind that of a Toyota. The styling of a Honda can be as dramatic as an Alfa Romeo. Lexus and Audi are playing in areas previously the preserve of BMW and Mercedes and so on.

To me this is also where the market for BI tools is at present. It is relatively mature and the differences between product sets are less than before.

Of course this doesn’t mean that the BI field will not be shaken up by some new technology or approach (in-memory BI or SaaS come to mind). This would be the equivalent of the impact that the first hybrid cars had on the auto market.

However, from the point of view of implementations, most BI tools will do at least an adequate job and picking one should not be your primary concern in a BI project.

Ajay- SAS Institute Chief Marketing Officer, Jim Davis (interviewed with this blog) points to the superiority of business analytics rather than business intelligence as an over hyped term. What numbers, statistics and graphs would you quote rather than semantics to help re direct those perceptions?

I myself use SAS,SPSS, R and find the decision management capabilities as James Taylor calls Decision Management much better enabled than by simple ETL tools or reporting and aggregating graphs tools in many BI tools.

Peter- I have expended quite a lot of energy and hundreds of words on this subject. If people are interested in my views, which are rather different to those of Jim Davis, then I’d suggest that they read them in a series of articles starting with Business Analytics vs Business Intelligence [URL http://peterthomas.wordpress.com/2009/03/28/business-analytics-vs-business-intelligence/ ].

I will however offer some further thoughts and to do this I’ll go back to my car industry analogy. In a world where cars are becoming more and more comparable in terms of their reliability, features, safety and economy, things like styling, brand management and marketing become more and more important.

As the true differences between BI vendors narrow, expect more noise to be made by marketing departments about how different their products are.

I have no problem in acknowledging SAS as a leader in Business Analytics, too many people I respect use their tools for me to think otherwise. However, I think a better marketing strategy for them would be to stick to the many positives of their own products. If they insist on continuing to trash competitors, then it would make sense for them to do this in a way that couldn’t be debunked by a high school student after ten seconds’ reflection.

Ajay- In your opinion what is the average RoI that a small, large medium enterprise gets by investing in a business intelligence platform. What advice would you give to such firms (separately) to help them make their minds?

Peter- The question is pretty much analogous to “What are the benefits of opening an office in China?” the answer is going to depend on what the company does; what their overall strategy is and how a China operation might complement this; whether their products and services are suitable for the Chinese market; how their costs, quality and features compare to local competitors; and whether they have cracked markets closer to home already.

To put things even more prosaically, “How long is a piece of string?”

Taking to one side the size and complexity of an organisation, BI projects come in all shapes and sizes.

Personally I have led Enterprise-wide, all-pervasive BI projects which have had a profound impact on the company. I have also seen well-managed and successful BI projects targeted on a very narrow and specific area.

The former obviously cost more than the latter, but the benefits are commensurately greater. In fact I would argue that the wider a BI project is spread, the greater its payback. Maybe lessons can be learnt and confidence built in an initial implementation to a small group, but to me the real benefit of BI is realised when it touches everything that a company does.

This is not based on a self-interested boosting of BI. To me if what we want to do is take better business decisions, then the greater number of such decisions that are impacted, the better that this is for the organisation.

Also there are some substantial up-front investments required for BI. These would include: building the BI team; establishing the warehouse and a physical architecture on which to deliver your application. If these can be leveraged more widely, then costs come down.

The same point can be made about the intellectual property that a successful BI team develops. This is one reason why I am a fan of the concept of BI Competency Centres [URL http://peterthomas.wordpress.com/2009/05/11/business-intelligence-competency-centres/ ].

I have been lucky enough to contribute to an organisation turning round from losing hundreds of millions of dollars to recording profits of twice that magnitude. When business managers cite BI as a major factor behind such a transformation, then this is clearly a technology that can be used to dramatic effect.

Nevertheless both estimating the potential impact of BI and measuring its actual effectiveness are non-trivial activities. A number of different approaches can be taken, some of which I cover in my article:

Measuring the benefits of Business Intelligence [URL http://peterthomas.wordpress.com/2009/02/26/measuring-the-benefits-of-business-intelligence/ ]. As ever there is no single recipe for success.

Ajay-. Which BI tool/ code are you most comfortable with and what are its salient points?

Peter –Although I have been successful with elements of the IBM-Cognos toolset and think that this has many strong points, not least being relatively user-friendly, I think I’ll go back to my earlier comments about this area being much less important than many others for the success of a BI project.

Ajay -How do you think cloud computing will change BI? What percentage of BI budgets go to data quality and what is eventual impact of data quality on results?

Peter –I think that the jury is still out on cloud computing and BI. By this I do not mean that cloud computing will not have an impact, but rather that it remains unclear what this impact will actually be.

Given the maturity of the market, my suspicion is that the BI equivalent of a Google is not going to emerge from nowhere. There are many excellent BI start-ups in this space and I have been briefed by quite a few of them.

However, I think the future of cloud computing in BI is likely to be determined by how the likes of IBM-Cognos, SAP-BusinessObjects and Oracle-Hyperion embrace the area.

Having said this, one of the interesting things in computing is how easy it is to misjudge the future and perhaps there is a potential titan of cloud BI currently gestating in the garage so beloved of IT mythology.

On data quality, I have never explicitly split out this component of a BI effort. Rather data quality has been an integral part of what we have done. Again I have taken a four-pillared approach:

  • improve how the data is entered;
  • make sure your interfaces aren’t the problem;
  • check how the data has been entered / interfaced;
  • and don’t suppress bad data in your BI.

The first pillar consists of improved validation in front-end systems – something that can be facilitated by the BI team providing master data to them – and also a focus on staff training, stressing the importance to the organisation of accurately recording certain data fields.

The second pillar is more to do with the general IT Architecture and how this relates to the Information Architecture, again master data has a role to play, but so does ensuring that the IT culture is one in which different teams collaborate well and are concerned about what happens to data when it leaves “their” systems.

The third pillar is the familiar world of after-the-fact data quality reports and auditing, something that is necessary, but not sufficient, for success in data quality.

Finally there is what I think can be one of the most important pillars; ensuring that the BI system takes a warts-and-all approach to data. This means that bad data is highlighted, rather than being suppressed. In turn this creates pressure for the problems to be addressed where they arise and creates a virtuous circle.

For those who might be interested in this area, I expand on it more in Using BI to drive improvements in data quality [URL http://peterthomas.wordpress.com/2009/02/11/using-bi-to-drive-improvements-in-data-quality/ ].

Ajay- You are well known with England’s rock climbing and boulder climbing community. A fun question- what is the similarity between a BI implementation/project and climbing a big boulder.

Peter –I would have to offer two minor clarifications.

First it is probably my partner who is better known in climbing circles, via here blog [URL http://77jenn.blogspot.com/ ] and articles and reviews that she has written for the climbing press; though I guess I can take credit for most of the photos and videos.

Second, particularly given the fact that a lot of our climbing takes place in Wales, I should acknowledge the broader UK climbing community and also mention our most mountainous region of Scotland.

Despite what many inhabitants of Sheffield might think to the contrary, there is life beyond Stanage Edge [URL http://en.wikipedia.org/wiki/Stanage ].

I have written about the determination and perseverance that are required to get to the top of a boulder, or indeed to the top of any type of climb [URL http://peterthomas.wordpress.com/2009/03/31/perseverance/ ].

I think those same qualities are necessary for any lengthy, complex project. I am a firm believer that the true benefits of BI are only realised when it leads to cultural transformation. Certainly the discipline of change management has many parallels with rock climbing. You need a positive attitude and a strong belief in your ultimate success, despite the inevitable setbacks. If one approach doesn’t yield fruit then you need to either fine-tune or try something radically different.

I suppose a final similarity is the feeling that you get having completed a climb, particularly if it is at the limit of your ability and has taken a long time to achieve. This is one of both elation and deep satisfaction, but is quickly displaced by a desire to find the next challenge.

This is something that I have certainly experienced in business life and I think the feelings will be familiar to many readers.

Biography-

 

Peter Thomas has led all-pervasive, Business Intelligence and Cultural Transformation projects serving the needs of 500+ users in multiple business units and service departments across 13 European and 5 Latin American countries. He has also developed Business Intelligence strategies for operations spanning four continents. His BI work has won two industry awards including “Best Enterprise BI Implementation”, from Cognos in 2006 and “Best use of IT in Insurance”, from Financial Sector Technology in 2005. Peter speaks about success factors in both Business Intelligence and the associated Change Management at seminars across both Europe and North America and writes about these areas and many other aspects of business, technology and change on his blog [URL http://peterthomas.wordpress.com ].

Interview Jill Dyche Baseline Consulting

Here is an interview with Jill Dyche, co-Founder Baseline Consulting and one of the best Business Intelligence consultants and analysts. Her writing is read by huge portion of the industry and has influenced many paradigms.She is also Author of e-Data, The CRM Handbook, and Customer Data Integration: Reaching a Single Version of the Truth.

BI tools are not recommended when they’re the first topic in a BI discussion.

Jill Dyche, Baseline Consulting

Ajay- What approximate Return of Investment would you give to various vendors within Business Intelligence?

Jill- You don’t kid around do you, Ajay? In general the answer has everything to do with the problem BI is solving for a company. For instance, we’re working on deploying operational BI at a retailer right now. This new program is giving people in the stores more power to make decisions about promotions and in-store events. The projected ROI is $300,000 per store per year—and the retailer has over 1000 stores. In another example, we’re working with an HMO client on a master data management project that helps it reconcile patient data across hospitals, clinics, pharmacies, and home health care. The ROI could be life-saving. So, as they say in the Visa commercials: Priceless.

Ajay- What is impact of third party cloud storage and processing do you think will be there on Business Intelligence consulting?

Jill- There’s a lot of buzz about cloud storage for BI, most of it is coming from the VC community at this point, not from our clients. The trouble with that is that BI systems really need control over their storage. There are companies out there—check out a product called RainStor—that do BI storage in the cloud very well, and are optimized for it. But most “cloud” environments geared to BI are really just hosted offerings that provide clients with infrastructure and processing resources that they don’t have in-house.  Where the cloud really has benefits is when it provides significant processing power to companies that can’t build it easily themselves.

Ajay- What are the top writing tips would you give to young struggling business bloggers especially in this recession.

Jill- I’d advise bloggers to write like they talk, a standard admonishment by many a professor of Business Writing. So much of today’s business writing—especially in blogs—is stilted, overly-formal, and pedantic. I don’t care if your grammar is accurate; if your writing sounds like the Monroe Doctrine, no one will read it. (Just give me one quote from the Monroe Doctrine. See what I mean?) Don’t use the word “leverage” when you can use the word “use.” Be genuine and conversational. And avoid clichés like the plague.

Ajay-  How would you convince young people especially women to join more science careers. Describe your own career journey.

Jill- As much as we need those role models in science, high-tech, and math careers, I’d tell them to only embrace it if they really love it. My career path to high-tech was unconventional and unintentional. I started as a technical writer specializing in relational databases just as they were getting hot. One thing I know for sure is if you want to learn about something interesting, be willing to roll up your sleeves and work with it. My technical writing about databases, and then data warehouses, led to some pretty interesting client work.

Sure I’ve coded SQL in my career, and optimized some pretty hairy WHERE clauses. But the bigger issue is applying that work to business problems. Actually I’m grateful that I wasn’t a very good programmer. I’d still be waiting for that infinite loop to finish running.

Ajay- What are the areas within an enterprise where implementation of BI leads to the most gains. And when are BI tools not recommended?

Jill- The best opportunities for BI are for supporting business growth. And that typically means BI used by sales and marketing. Who’s the next customer and what will they buy? It’s answers to questions like these that can set a company apart competitively and contribute to both the top and bottom lines.

Not to be too heretical, but to answer your second question: BI tools are not recommended when they’re the first topic in a BI discussion. We’ve had several “Don’t go into the light” conversations with clients lately where they are prematurely looking at BI tools rather than examining their overall BI readiness. Companies need to be honest about their development processes, existing skill sets, and their data and platform infrastructures before they start phoning up data visualization vendors. Unfortunately, many people engage BI software vendors way before they’re ready.

Ajay- You and your partner Evan wrote what was really the first book on Master Data Management. But you’d been in the BI and data warehousing world before that. Why MDM?

Jill- We just kept watching what our clients couldn’t pull off with their data warehouses. We saw the effort they were going through to enforce business rules through ETL, and what they were trying to do to match records across different source systems. We also saw the amount of manual effort that went into things like handling survivor records, which leads to a series of conversations about data ownership.

Our book (Customer Data Integration: Reaching a Single Version of the Truth, Wiley) has as much to do with data management and data governance as it does with CDI and MDM. As Evan recently said in his presentation at the TDWI MDM Insight event, “You can’t master your data until you manage your data.” We really believe that, and our clients are starting to put it into practice too.

Ajay- Why did you and Evan choose to focus on customer master data (CDI) rather than a more general book on MDM?

Jill- There were two reasons. The first one was because other master data domains like product and location have their own unique sets of definitions and rules. Even though these domains also need MDM, they’re different and the details around implementing them and choosing vendor products to enable them are different. The second reason was that the vast majority of our clients started their MDM programs with customer data. One of Baseline’s longest legacies is enabling the proverbial “360-degree view” of customers. It’s what we knew.

Ajay- What’s surprised you most about your CDI/MDM clients?

Jill- The extent to which they use CDI and MDM as the context for bringing IT and the business closer together. You’d think BI would be ideal for that, and it is. But it’s interesting how MDM lets companies strip back a lot of the tool discussions and just focus on the raw conversations about definitions and rules for business data. Business people get why data is so important, and IT can help guide them in conversations about streamlining data quality and management. Companies like Dell have used MDM for nothing less than business alignment.

Ajay- Any plan to visit India and China for giving lectures?

Jill- I just turned down a trip to China this fall because I had a schedule conflict, which I’m really bummed about. Far as India is concerned, nothing yet but if you’re looking for houseguests let me know.(Ajay- sure I have a big brand new house just ready- and if I visit USA may I be a house guest too?)

About Jill Dyche-

Jill blogs at http://www.jilldyche.com/. where she takes the perpetual challenge of business-IT alignment head on in her trenchant, irreverent style.

Jill Dyché is a partner and co-founder of Baseline Consulting. Her role at Baseline is a combination of best-practice expert, industry gadfly, key client advisor, and all-around thought leader. She is responsible for key client strategies and market analysis in the areas of data governance, business intelligence, master data management, and customer relationship management. Jill counsels boards of directors on the strategic importance of their information investments.

Author

Jill is the author of three books on the business value of IT. Jill’s first book, e-Data (Addison Wesley, 2000) has been published in eight languages. She is a contributor to Impossible Data Warehouse Situations: Solutions from the Experts (Addison Wesley, 2002), and her book, The CRM Handbook (Addison Wesley, 2002), is the bestseller on the topic.

Jill’s work has been featured in major publications such as Computerworld, Information Week, CIO Magazine, the Wall Street Journal, the Chicago Tribune and Newsweek.com. Jill’s latest book, Customer Data Integration (John Wiley and Sons, 2006) was co-authored with Baseline partner Evan Levy, and shows the business breakthroughs achieved with integrated customer data.

Industry Expert

Jill is a featured speaker at industry conferences, university programs, and vendor events. She serves as a judge for several IT best practice awards. She is a member of the Society of Information Management and Women in Technology, a faculty member of TDWI, and serves as a co-chair for the MDM Insight conference. Jill is a columnist for DM Review, and a blogger for BeyeNETWORK and Baseline Consulting.