Another aspect of tool usefulness is how much does it help with the entire data mining process from data preparation and cleaning, modeling, evaluation, visualization and presentation (excluding deployment).
New KDnuggets Poll is asking: What part of your analytics / data mining work in the past 12 months was done in R?
A Summary report from Rexer Analytics Annual Survey
HIGHLIGHTS from the 4th Annual Data Miner Survey (2010):
• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.
• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.
• MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.
• TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.
• TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.
• CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. The best practices are available online.
• FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified. There is room to improve: only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.
Please contact us if you have any questions about the attached report or this annual research program. The 5th Annual Data Miner Survey will be launching next month. We will email you an invitation to participate.
Predictive Analytics World is pleased to announce on-demand access to the videos of PAW Washington DC, October 2010, including over 30 sessions and keynotes that you may view at your convenience. Access this leading predictive analytics content online now:
Select individual conference sessions, or recognize savings by registering for access to one or two full days of sessions. These on-demand videos deliver PAW DC right to your desk, covering hot topics and advanced methods such as:
PAW DC videos feature over 25 speakers with case studies from leading enterprises such as: CIBC, CEB, Forrester, Macy’s, MetLife, Microsoft, Miles Kimball, Monster.com, Oracle, Paychex, SunTrust, Target, UPMC, Xerox, Yahoo!, YMCA, and more.
Keynote: Five Ways Predictive Analytics Cuts Enterprise Risk
Eric Siegel,Ph.D., Program Chair, Predictive Analytics World
All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.
Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.
In this keynote session, Dr. Eric Siegel reveals:
– Five ways predictive analytics evolves your enterprise to reduce risk
– Hidden sources of risk across operational functions
– What every business should learn from insurance companies
– How advancements have reversed the very meaning of fraud
– Why “man + machine” teams are greater than the sum of their parts for enterprise decision support
Platinum Sponsor Presentation: Analytics – The Beauty of Diversity
Anne H. Milley,Senior Director of Analytic Strategy, Worldwide Product Marketing, SAS
Analytics contributes to, and draws from, multiple disciplines. The unifying theme of “making the world a better place” is bred from diversity. For instance, the same methods used in econometrics might be used in market research, psychometrics and other disciplines. In a similar way, diverse paradigms are needed to best solve problems, reveal opportunities and make better decisions. This is why we evolve capabilities to formulate and solve a wide range of problems through multiple integrated languages and interfaces. Extending that, we have provided integration with other languages so that users can draw on the disciplines and paradigms needed to best practice their craft.
Gold Sponsor Presentation: Predictive Analytics Accelerate Insight for Financial Services
Finbarr Deely,Director of Business Development,ParAccel
Financial services organizations face immense hurdles in maintaining profitability and building competitive advantage. Financial services organizations must perform “what-if” scenario analysis, identify risks, and detect fraud patterns. The advanced analytic complexity required often makes such analysis slow and painful, if not impossible. This presentation outlines the analytic challenges facing these organizations and provides a clear path to providing the accelerated insight needed to perform in today’s complex business environment to reduce risk, stop fraud and increase profits. * The value of predictive analytics in Accelerating Insight * Financial Services Analytic Case Studies * Brief Overview of ParAccel Analytic Database
TOPIC: SURVEY ANALYSIS Case Study: YMCA Turning Member Satisfaction Surveys into an Actionable Narrative
Dean Abbott,President, Abbott Analytics
Employees are a key constituency at the Y and previous analysis has shown that their attitudes have a direct bearing on Member Satisfaction. This session will describe a successful approach for the analysis of YMCA employee surveys. Decision trees are built and examined in depth to identify key questions in describing key employee satisfaction metrics, including several interesting groupings of employee attitudes. Our approach will be contrasted with other factor analysis and regression-based approaches to survey analysis that we used initially. The predictive models described are currently in use and resulted in both greater understanding of employee attitudes, and a revised “short-form” survey with fewer key questions identified by the decision trees as the most important predictors.
TOPIC: INDUSTRY TRENDS 2010 Data Minter Survey Results: Highlights
Karl Rexer,Ph.D., Rexer Analytics
Do you want to know the views, actions, and opinions of the data mining community? Each year, Rexer Analytics conducts a global survey of data miners to find out. This year at PAW we unveil the results of our 4th Annual Data Miner Survey. This session will present the research highlights, such as:
Multiple Case Studies: U.S. DoD, U.S. DHS, SSA Text Mining: Lessons Learned
John F. Elder,Chief Scientist, Elder Research, Inc.
Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.
In solving unstructured (text) analysis challenges, we found that principles from inductive modeling – learning relationships from labeled cases – has great power to enhance text mining. Dr. Elder highlights key technical breakthroughs discovered while working on projects for leading government agencies, including: Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.
– Prioritizing searches for the Dept. of Homeland Security
– Quick decisions for Social Security Admin. disability
– Document discovery for the Dept. of Defense
– Disease discovery for the Dept. of Homeland Security
Keynote: How Target Gets the Most out of Its Guest Data to Improve Marketing ROI
Andrew Pole,Senior Manager, Media and Database Marketing, Target
In this session, you’ll learn how Target leverages its own internal guest data to optimize its direct marketing – with the ultimate goal of enhancing our guests’ shopping experience and driving in-store and online performance. You will hear about what guest data is available at Target, how and where we collect it, and how it is used to improve the performance and relevance of direct marketing vehicles. Furthermore, we will discuss Target’s development and usage of guest segmentation, response modeling, and optimization as means to suppress poor performers from mailings, determine relevant product categories and services for online targeted content, and optimally assign receipt marketing offers to our guests when offer quantities are limited.
Platinum Sponsor Presentation: Driving Analytics Into Decision Making
Jason Verlen,Director, SPSS Product Strategy & Management, IBM Software Group
Organizations looking to dramatically improve their business outcomes are turning to decision management, a convergence of technology and business processes that is used to streamline and predict the outcome of daily decision-making. IBM SPSS Decision Management technology provides the critical link between analytical insight and recommended actions. In this session you’ll learn how Decision Management software integrates analytics with business rules and business applications for front-line systems such as call center applications, insurance claim processing, and websites. See how you can improve every customer interaction, minimize operational risk, reduce fraud and optimize results.
TOPIC: DATA INFRASTRUCTURE AND INTEGRATION Case Study: Macy’s The world is not flat (even though modeling software has to think it is)
Paul Coleman,Director of Marketing Statistics, Macy’s Inc.
Software for statistical modeling generally use flat files, where each record represents a unique case with all its variables. In contrast most large databases are relational, where data are distributed among various normalized tables for efficient storage. Variable creation and model scoring engines are necessary to bridge data mining and storage needs. Development datasets taken from a sampled history require snapshot management. Scoring datasets are taken from the present timeframe and the entire available universe. Organizations, with significant data, must decide when to store or calculate necessary data and understand the consequences for their modeling program.
TOPIC: CUSTOMER VALUE Case Study: SunTrust When One Model Will Not Solve the Problem – Using Multiple Models to Create One Solution
Dudley Gwaltney,Group Vice President, Analytical Modeling, SunTrust Bank
In 2007, SunTrust Bank developed a series of models to identify clients likely to have large changes in deposit balances. The models include three basic binary and two linear regression models.
Based on the models, 15% of SunTrust clients were targeted as those most likely to have large balance changes. These clients accounted for 65% of the absolute balance change and 60% of the large balance change clients. The targeted clients are grouped into a portfolio and assigned to individual SunTrust Retail Branch. Since 2008, the portfolio generated a 2.6% increase in balances over control.
Using the SunTrust example, this presentation will focus on:
TOPIC: RESPONSE & CROSS-SELL Case Study: Paychex Staying One Step Ahead of the Competition – Development of a Predictive 401(k) Marketing and Sales Campaign
Jason Fox,Information Systems and Portfolio Manager,Paychex
In-depth case study of Paychex, Inc. utilizing predictive modeling to turn the tides on competitive pressures within their own client base. Paychex, a leading provider of payroll and human resource solutions, will guide you through the development of a Predictive 401(k) Marketing and Sales model. Through the use of sophisticated data mining techniques and regression analysis the model derives the probability a client will add retirement services products with Paychex or with a competitor. Session will include roadblocks that could have ended development and ROI analysis. Speaker: Frank Fiorille, Director of Enterprise Risk Management, Paychex Speaker: Jason Fox, Risk Management Analyst, Paychex
TOPIC: SEGMENTATION Practitioner: Canadian Imperial Bank of Commerce Segmentation Do’s and Don’ts
Daymond Ling,Senior Director, Modelling & Analytics,Canadian Imperial Bank of Commerce
The concept of Segmentation is well accepted in business and has withstood the test of time. Even with the advent of new artificial intelligence and machine learning methods, this old war horse still has its place and is alive and well. Like all analytical methods, when used correctly it can lead to enhanced market positioning and competitive advantage, while improper application can have severe negative consequences.
This session will explore what are the elements of success, and what are the worse practices that lead to failure. The relationship between segmentation and predictive modeling will also be discussed to clarify when it is appropriate to use one versus the other, and how to use them together synergistically.
TOPIC: SOCIAL DATA
Thought Leadership Social Network Analysis: Killer Application for Cloud Analytics
James Kobielus,Senior Analyst, Forrester Research
Social networks such as Twitter and Facebook are a potential goldmine of insights on what is truly going through customers´minds. Every company wants to know whether, how, how often, and by whom they´re being mentioned across the billowing new cloud of social media. Just as important, every company wants to influence those discussions in their favor, target new business, and harvest maximum revenue potential. In this session, Forrester analyst James Kobielus identifies fruitful applications of social network analysis in customer service, sales, marketing, and brand management. He presents a roadmap for enterprises to leverage their inline analytics initiatives and leverage high-performance data warehousing (DW) clouds and appliances in order to analyze shifting patterns of customer sentiment, influence, and propensity. Leveraging Forrester’s ongoing research in advanced analytics and customer relationship management, Kobielus will discuss industry trends, commercial modeling tools, and emerging best practices in social network analysis, which represents a game-changing new discipline in predictive analytics.
Director of Database Marketing, Life Line Screening
While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK. Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc Speaker: Trish Mathe, Director of Database Marketing, Life Line Screening
Marketers use surveys to create enterprise wide applicable strategic insights to: (1) develop segmentation schemes, (2) summarize consumer behaviors and attitudes for the whole US population, and (3) use multiple surveys to draw unified views about their target audience. However, these insights are not directly addressable and scalable to the whole consumer universe which is very important when applying the power of survey intelligence to the one to one consumer marketing problems marketers routinely face. Acxiom partnered with Forrester Research, creating addressable and scalable applications of Forrester’s Technographics Survey and applied it successfully to a number of industries and applications.
TOPIC: HEALTHCARE Case Study: UPMC Health Plan A Predictive Model for Hospital Readmissions
Scott Zasadil,Senior Scientist, UPMC Health Plan
Hospital readmissions are a significant component of our nation’s healthcare costs. Predicting who is likely to be readmitted is a challenging problem. Using a set of 123,951 hospital discharges spanning nearly three years, we developed a model that predicts an individual’s 30-day readmission should they incur a hospital admission. The model uses an ensemble of boosted decision trees and prior medical claims and captures 64% of all 30-day readmits with a true positive rate of over 27%. Moreover, many of the ‘false’ positives are simply delayed true positives. 53% of the predicted 30-day readmissions are readmitted within 180 days.