Using Rapid Miner and R for Sports Analytics #rstats

Rapid Miner has been one of the oldest open source analytics software, long long before open source or even analytics was considered a fashion buzzword. The Rapid Miner software has been a pioneer in many areas (like establishing a marketplace for Rapid Miner Extensions) and the Rapid Miner -R extension was one of the most promising enablers of using R in an enterprise setting.
The following interview was taken with a manager of analytics for a sports organization. The sports organization considers analytics as a strategic differentiator , hence the name is confidential. No part of the interview has been edited or manipulated.

Ajay- Why did you choose Rapid Miner and R? What were the other software alternatives you considered and discarded?

Analyst- We considered most of the other major players in statistics/data mining or enterprise BI.  However, we found that the value proposition for an open source solution was too compelling to justify the premium pricing that the commercial solutions would have required.  The widespread adoption of R and the variety of packages and algorithms available for it, made it an easy choice.  We liked RapidMiner as a way to design structured, repeatable processes, and the ability to optimize learner parameters in a systematic way.  It also handled large data sets better than R on 32-bit Windows did.  The GUI, particularly when 5.0 was released, made it more usable than R for analysts who weren’t experienced programmers.

Ajay- What analytics do you do think Rapid Miner and R are best suited for?

 Analyst- We use RM+R mainly for sports analysis so far, rather than for more traditional business applications.  It has been quite suitable for that, and I can easily see how it would be used for other types of applications.

 Ajay- Any experiences as an enterprise customer? How was the installation process? How good is the enterprise level support?

Analyst- Rapid-I has been one of the most responsive tech companies I’ve dealt with, either in my current role or with previous employers.  They are small enough to be able to respond quickly to requests, and in more than one case, have fixed a problem, or added a small feature we needed within a matter of days.  In other cases, we have contracted with them to add larger pieces of specific functionality we needed at reasonable consulting rates.  Those features are added to the mainline product, and become fully supported through regular channels.  The longer consulting projects have typically had a turnaround of just a few weeks.

 Ajay- What challenges if any did you face in executing a pure open source analytics bundle ?

Analyst- As Rapid-I is a smaller company based in Europe, the availability of training and consulting in the USA isn’t as extensive as for the major enterprise software players, and the time zone differences sometimes slow down the communications cycle.  There were times where we were the first customer to attempt a specific integration point in our technical environment, and with no prior experiences to fall back on, we had to work with Rapid-I to figure out how to do it.  Compared to the what traditional software vendors provide, both R and RM tend to have sparse, terse, occasionally incomplete documentation.  The situation is getting better, but still lags behind what the traditional enterprise software vendors provide.

 Ajay- What are the things you can do in R ,and what are the things you prefer to do in Rapid Miner (comparison for technical synergies)

Analyst- Our experience has been that RM is superior to R at writing and maintaining structured processes, better at handling larger amounts of data, and more flexible at fine-tuning model parameters automatically.  The biggest limitation we’ve had with RM compared to R is that R has a larger library of user-contributed packages for additional data mining algorithms.  Sometimes we opted to use R because RM hadn’t yet implemented a specific algorithm.  The introduction the R extension has allowed us to combine the strengths of both tools in a very logical and productive way.

In particular, extending RapidMiner with R helped address RM’s weakness in the breadth of algorithms, because it brings the entire R ecosystem into RM (similar to how Rapid-I implemented much of the Weka library early on in RM’s development).  Further, because the R user community releases packages that implement new techniques faster than the enterprise vendors can, this helps turn a potential weakness into a potential strength.  However, R packages tend to be of varying quality, and are more prone to go stale due to lack of support/bug fixes.  This depends heavily on the package’s maintainer and its prevalence of use in the R community.  So when RapidMiner has a learner with a native implementation, it’s usually better to use it than the R equivalent.

RapidMiner launches extensions marketplace

For some time now, I had been hoping for a place where new package or algorithm developers get at least a fraction of the money that iPad or iPhone application developers get. Rapid Miner has taken the lead in establishing a marketplace for extensions. Is there going to be paid extensions as well- I hope so!!

This probably makes it the first “app” marketplace in open source and the second app marketplace in analytics after salesforce.com

It is hard work to think of new algols, and some of them can really be usefull.

Can we hope for #rstats marketplace where people downloading say ggplot3.0 atleast get a prompt to donate 99 cents per download to Hadley Wickham’s Amazon wishlist. http://www.amazon.com/gp/registry/1Y65N3VFA613B

Do you think it is okay to pay 99 cents per iTunes song, but not pay a cent for open source software.

I dont know- but I am just a capitalist born in a country that was socialist for the first 13 years of my life. Congratulations once again to Rapid Miner for innovating and leading the way.

http://rapid-i.com/component/option,com_myblog/show,Rapid-I-Marketplace-Launched.html/Itemid,172

RapidMinerMarketplaceExtensions 30 May 2011
Rapid-I Marketplace Launched by Simon Fischer

Over the years, many of you have been developing new RapidMiner Extensions dedicated to a broad set of topics. Whereas these extensions are easy to install in RapidMiner – just download and place them in the plugins folder – the hard part is to find them in the vastness that is the Internet. Extensions made by ourselves at Rapid-I, on the other hand,  are distributed by the update server making them searchable and installable directly inside RapidMiner.

We thought that this was a bit unfair, so we decieded to open up the update server to the public, and not only this, we even gave it a new look and name. The Rapid-I Marketplace is available in beta mode at http://rapidupdate.de:8180/ . You can use the Web interface to browse, comment, and rate the extensions, and you can use the update functionality in RapidMiner by going to the preferences and entering http://rapidupdate.de:8180/UpdateServer/ as the update server URL. (Once the beta test is complete, we will change the port back to 80 so we won’t have any firewall problems.)

As an Extension developer, just register with the Marketplace and drop me an email (fischer at rapid-i dot com) so I can give you permissions to upload your own extension. Upload is simple provided you use the standard RapidMiner Extension build process and will boost visibility of your extension.

Looking forward to see many new extensions there soon!

Disclaimer- Decisionstats is a partner of Rapid Miner. I have been liking the software for a long long time, and recently agreed to partner with them just like I did with KXEN some years back, and with Predictive AnalyticsConference, and Aster Data until last year.

I still think Rapid Miner is a very very good software,and a globally created software after SAP.

Here is the actual marketplace

http://rapidupdate.de:8180/UpdateServer/faces/index.xhtml

Welcome to the Rapid-I Marketplace Public Beta Test

The Rapid-I Marketplace will soon replace the RapidMiner update server. Using this marketplace, you can share your RapidMiner extensions and make them available for download by the community of RapidMiner users. Currently, we are beta testing this server. If you want to use this server in RapidMiner, you must go to the preferences and enter http://rapidupdate.de:8180/UpdateServer for the update url. After the beta test, we will change the port back to 80, which is currently occupied by the old update server. You can test the marketplace as a user (downloading extensions) and as an Extension developer. If you want to publish your extension here, please let us know via the contact form.

Hot Downloads
«« « 1 2 3 » »»
[Icon]The Image Processing Extension provides operators for handling image data. You can extract attributes describing colour and texture in the image, you can make several transformation of a image data which allows you to perform segmentation and detection of suspicious areas in image data.The extension provides many of image transformation and extraction operators ranging from Wavelet Decomposition, Hough Circle to Block Difference of Inverse probabilities.

[Icon]RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.

  • Data IntegrationAnalytical ETLData Analysis, and Reporting in one single suite
  • Powerful but intuitive graphical user interface for the design of analysis processes
  • Repositories for process, data and meta data handling
  • Only solution with meta data transformation: forget trial and error and inspect results already during design time
  • Only solution which supports on-the-fly error recognition and quick fixes
  • Complete and flexible: Hundreds of data loading, data transformation, data modeling, and data visualization methods
[Icon]All modeling methods and attribute evaluation methods from the Weka machine learning library are available within RapidMiner. After installing this extension you will get access to about 100 additional modelling schemes including additional decision trees, rule learners and regression estimators.This extension combines two of the most widely used open source data mining solutions. By installing it, you can extend RapidMiner to everything what is possible with Weka while keeping the full analysis, preprocessing, and visualization power of RapidMiner.

[Icon]Finally, the two most widely used data analysis solutions – RapidMiner and R – are connected. Arbitrary R models and scripts can now be directly integrated into the RapidMiner analysis processes. The new R perspective offers the known R console together with the great plotting facilities of R. All variables and R scripts can be organized in the RapidMiner Repository.A directly included online help and multi-line editing makes the creation of R scripts much more comfortable.

Interview Ajay Ohri Decisionstats.com with DMR

From-

http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

Also I have worked with a lot of KXEN, SQL, SPSS.

DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys :) – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

The more questions I asked around the internet, the more people responded, and I wrote it all down.

I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

DMR: There is a rumor about a R book you plan to publish in 2011 :-) Can you confirm the rumor and tell us more?

AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.

Brief Bio-
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

An Introduction to Data Mining-online book

I was reading David Smith’s blog http://blog.revolutionanalytics.com/

where he mentioned this interview of Norman Nie, at TDWI

http://tdwi.org/Articles/2010/11/17/R-101.aspx?Page=2

where I saw this link (its great if you want to study Data Mining btw)

http://www.kdnuggets.com/education/usa-canada.html

and I c/liked the U Toronto link

http://chem-eng.utoronto.ca/~datamining/

Best of All- I really liked this online book created by Professor S. Sayad

Its succinct and beautiful and describes all of the Data Mining you want to read in one Map (actually 4 images painstakingly assembled with perfection)

The best thing is- in the original map- even the sub items are click-able for specifics like Pie Chart and Stacked Column chart are not in one simple drop down like Charts- but rather by nature of the kind of variables that lead to these charts. For doing that- you would need to go to the site itself- ( see http://chem-eng.utoronto.ca/~datamining/dmc/categorical_variables.htm

vs

http://chem-eng.utoronto.ca/~datamining/dmc/categorical_numerical.htm

Again- there is no mention of the data visualization software used to create the images but I think I can take a hint from the Software Page which says software used are-

Software

See it on your own-online book (c)Professor S. Sayad

Really good DIY tutorial

http://chem-eng.utoronto.ca/~datamining/dmc/data_mining_map.htm

PAWCON -This week in London

Watch out for the twitter hash news on PAWCON and the exciting agenda lined up. If your in the City- you may want to just drop in

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Disclaimer- PAWCON has been a blog partner with Decisionstats (since the first PAWCON ). It is vendor neutral and features open source as well proprietary software, as well case studies from academia and Industry for a balanced view.

 

Little birdie told me some exciting product enhancements may be in the works including a not yet announced R plugin 😉 and the latest SAS product using embedded analytics and Dr Elder’s full day data mining workshop.

Citation-

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Monday November 15, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:50am

Keynote
Five Ways Predictive Analytics Cuts Enterprise Risk

All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.

Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.

In this keynote session, Dr. Eric Siegel will reveal:

  • Five ways predictive analytics evolves your enterprise to reduce risk
  • Hidden sources of risk across operational functions
  • What every business should learn from insurance companies
  • How advancements have reversed the very meaning of fraud
  • Why “man + machine” teams are greater than the sum of their parts for
  • enterprise decision support

 

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


IBM9:50am-10:10am

Platinum Sponsor Presentation
The Analytical Revolution

The algorithms at the heart of predictive analytics have been around for years – in some cases for decades. But now, as we see predictive analytics move to the mainstream and become a competitive necessity for organisations in all industries, the most crucial challenges are to ensure that results can be delivered to where they can make a direct impact on outcomes and business performance, and that the application of analytics can be scaled to the most demanding enterprise requirements.

This session will look at the obstacles to successfully applying analysis at the enterprise level, and how today’s approaches and technologies can enable the true “industrialisation” of predictive analytics.

Speaker: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Top of this page ] [ Agenda overview ]


Deloitte10:10am-10:20am

Gold Sponsor Presentation
How Predictive Analytics is Driving Business Value

Organisations are increasingly relying on analytics to make key business decisions. Today, technology advances and the increasing need to realise competitive advantage in the market place are driving predictive analytics from the domain of marketers and tactical one-off exercises to the point where analytics are being embedded within core business processes.

During this session, Richard will share some of the focus areas where Deloitte is driving business transformation through predictive analytics, including Workforce, Brand Equity and Reputational Risk, Customer Insight and Network Analytics.

Speaker: Richard Fayers, Senior Manager, Deloitte Analytical Insight

Top of this page ] [ Agenda overview ]


10:20am-10:45am

Break / Exhibits
Room: Albert Suites


10:45am-11:35am
Healthcare
Case Study: Life Line Screening
Taking CRM Global Through Predictive Analytics

While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK.

Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc.

Speaker: Trish Mathe, Life Line Screening

Top of this page ] [ Agenda overview ]


11:35am-12:25pm
Open Source Analytics; Healthcare
Case Study: A large health care organization
The Rise of Open Source Analytics: Lowering Costs While Improving Patient Care

Rapidminer and R were the number 1 and 2 in this years annual KDNuggets data mining tool usage poll, followed by Knime on place 4 and Weka on place 6. So what’s going on here? Are these open source tools really that good or is their popularity strongly correlated with lower acquisition costs alone? This session answers these questions based on a real world case for a large health care organization and explains the risks & benefits of using open source technology. The final part of the session explains how these tools stack up against their traditional, proprietary counterparts.

Speaker: Jos van Dongen, Associate & Principal, DeltIQ Group

Top of this page ] [ Agenda overview ]


12:25pm-1:25pm

Lunch / Exhibits
Room: Albert Suites


1:25pm-2:15pm
Keynote
Thought Leader:
Case Study: Yahoo! and other large on-line e-businesses
Search Marketing and Predictive Analytics: SEM, SEO and On-line Marketing Case Studies

Search Engine Marketing is a $15B industry in the U.S. growing to double that number over the next 3 years. Worldwide the SEM market was over $50B in 2010. Not only is this a fast growing area of marketing, but it is one that has significant implications for brand and direct marketing and is undergoing rapid change with emerging channels such as mobile and social. What is unique about this area of marketing is a singularly heavy dependence on analytics:

 

  • Large numbers of variables and options
  • Real-time auctions/bids and a need to adjust strategies in real-time
  • Difficult optimization problems on allocating spend across a huge number of keywords
  • Fast-changing competitive terrain and heavy competition on the obvious channels
  • Complicated interactions between various channels and a large choice of search keyword expansion possibilities
  • Profitability and ROI analysis that are complex and often challenging

 

The size of the industry, its growing importance in marketing, its upcoming role in Mobile Advertising, and its uniquely heavy reliance on analytics makes it particularly interesting as an area for predictive analytics applications. In this session, not only will hear about some of the latest strategies and techniques to optimize search, you will hear case studies that illustrate the important role of analytics from industry practitioners.

Speaker: Usama Fayyad, , Ph.D., CEO, Open Insights

Top of this page ] [ Agenda overview ]


SAS2:15pm-2:35pm

Platinum Sponsor Presentation
Creating a Model Factory Using in-Database Analytics

With the ever-increasing number of analytical models required to make fact-based decisions, as well as increasing audit compliance regulations, it is more important than ever that these models can be created, monitored, retuned and deployed as quickly and automatically as possible. This paper, using a case study from a major financial organisation, will show how organisations can build a model factory efficiently using the latest SAS technology that utilizes the power of in-database processing.

Speaker: John Spooner, Analytics Specialist, SAS (UK)

Top of this page ] [ Agenda overview ]


2:35pm-2:45pm

Session Break
Room: Albert Suites


2:45pm-3:35pm

Retail
Case Study: SABMiller
Predictive Analytics & Global Marketing Strategy

Over the last few years SABMiller plc, the second largest brewing company in the world operating in 70 countries, has been systematically segmenting its markets in different countries globally in order optimize their portfolio strategy & align it to their long term country specific growth strategy. This presentation talks about the overall methodology followed and the challenges that had to be overcome both from a technical as well as from a change management stand point in order to successfully implement a standard analytics approach to diverse markets and diverse business positions in a highly global setting.

The session explains how country specific growth strategies were converted to objective variables and consumption occasion segments were created that differentiated the market effectively by their growth potential. In addition to this the presentation will also provide a discussion on issues like:

  • The dilemmas of static vs. dynamic solutions and standardization vs. adaptable solutions
  • Challenges in acceptability, local capability development, overcoming implementation inertia, cost effectiveness, etc
  • The role that business partners at SAB and analytics service partners at AbsolutData together play in providing impactful and actionable solutions

 

Speaker: Anne Stephens, SABMiller plc

Speaker: Titir Pal, AbsolutData

Top of this page ] [ Agenda overview ]


3:35pm-4:25pm

Retail
Case Study: Overtoom Belgium
Increasing Marketing Relevance Through Personalized Targeting

 

Since many years, Overtoom Belgium – a leading B2B retailer and division of the French Manutan group – focuses on an extensive use of CRM. In this presentation, we demonstrate how Overtoom has integrated Predictive Analytics to optimize customer relationships. In this process, they employ analytics to develop answers to the key question: “which product should we offer to which customer via which channel”. We show how Overtoom gained a 10% revenue increase by replacing the existing segmentation scheme with accurate predictive response models. Additionally, we illustrate how Overtoom succeeds to deliver more relevant communications by offering personalized promotional content to every single customer, and how these personalized offers positively impact Overtoom’s conversion rates.

Speaker: Dr. Geert Verstraeten, Python Predictions

Top of this page ] [ Agenda overview ]


4:25pm-4:50pm

Break / Exhibits
Room: Albert Suites


4:50pm-5:40pm
Uplift Modelling:
Case Study: Lloyds TSB General Insurance & US Bank
Uplift Modelling: You Should Not Only Measure But Model Incremental Response

Most marketing analysts understand that measuring the impact of a marketing campaign requires a valid control group so that uplift (incremental response) can be reported. However, it is much less widely understood that the targeting models used almost everywhere do not attempt to optimize that incremental measure. That requires an uplift model.

This session will explain why a switch to uplift modelling is needed, illustrate what can and does go wrong when they are not used and the hugely positive impact they can have when used effectively. It will also discuss a range of approaches to building and assessing uplift models, from simple basic adjustments to existing modelling processes through to full-blown uplift modelling.

The talk will use Lloyds TSB General Insurance & US Bank as a case study and also illustrate real-world results from other companies and sectors.

 

Speaker: Nicholas Radcliffe, Founder and Director, Stochastic Solutions

Top of this page ] [ Agenda overview ]


5:40pm-6:30pm

Consumer services
Case Study: Canadian Automobile Association and other B2C examples
The Diminishing Marginal Returns of Variable Creation in Predictive Analytics Solutions

 

Variable Creation is the key to success in any predictive analytics exercise. Many different approaches are adopted during this process, yet there are diminishing marginal returns as the number of variables increase. Our organization conducted a case study on four existing clients to explore this so-called diminishing impact of variable creation on predictive analytics solutions. Existing predictive analytics solutions were built using our traditional variable creation process. Yet, presuming that we could exponentially increase the number of variables, we wanted to determine if this added significant benefit to the existing solution.

Speaker: Richard Boire, BoireFillerGroup

Top of this page ] [ Agenda overview ]


6:30pm-7:30pm

Reception / Exhibits
Room: Albert Suites


Tuesday November 16, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:55am
Keynote
Multiple Case Studies: Anheuser-Busch, Disney, HP, HSBC, Pfizer, and others
The High ROI of Data Mining for Innovative Organizations

Data mining and advanced analytics can enhance your bottom line in three basic ways, by 1) streamlining a process, 2) eliminating the bad, or 3) highlighting the good. In rare situations, a fourth way – creating something new – is possible. But modern organizations are so effective at their core tasks that data mining usually results in an iterative, rather than transformative, improvement. Still, the impact can be dramatic.

Dr. Elder will share the story (problem, solution, and effect) of nine projects conducted over the last decade for some of America’s most innovative agencies and corporations:

    Streamline:

  • Cross-selling for HSBC
  • Image recognition for Anheuser-Busch
  • Biometric identification for Lumidigm (for Disney)
  • Optimal decisioning for Peregrine Systems (now part of Hewlett-Packard)
  • Quick decisions for the Social Security Administration
    Eliminate Bad:

  • Tax fraud detection for the IRS
  • Warranty Fraud detection for Hewlett-Packard
    Highlight Good:

  • Sector trading for WestWind Foundation
  • Drug efficacy discovery for Pharmacia & UpJohn (now Pfizer)

Moderator: Eric Siegel, Program Chair, Predictive Analytics World

Speaker: John Elder, Ph.D., Elder Research, Inc.

Also see Dr. Elder’s full-day workshop

 

Top of this page ] [ Agenda overview ]


9:55am-10:30am

Break / Exhibits
Room: Albert Suites


10:30am-11:20am
Telecommunications
Case Study: Leading Telecommunications Operator
Predictive Analytics and Efficient Fact-based Marketing

The presentation describes what are the major topics and issues when you introduce predictive analytics and how to build a Fact-Based marketing environment. The introduced tools and methodologies proved to be highly efficient in terms of improving the overall direct marketing activity and customer contact operations for the involved companies. Generally, the introduced approaches have great potential for organizations with large customer bases like Mobile Operators, Internet Giants, Media Companies, or Retail Chains.

Main Introduced Solutions:-Automated Serial Production of Predictive Models for Campaign Targeting-Automated Campaign Measurements and Tracking Solutions-Precise Product Added Value Evaluation.

Speaker: Tamer Keshi, Ph.D., Long-term contractor, T-Mobile

Speaker: Beata Kovacs, International Head of CRM Solutions, Deutsche Telekom

Top of this page ] [ Agenda overview ]


11:20am-11:25am

Session Changeover


11:25am-12:15pm
Thought Leader
Nine Laws of Data Mining

Data mining is the predictive core of predictive analytics, a business process that finds useful patterns in data through the use of business knowledge. The industry standard CRISP-DM methodology describes the process, but does not explain why the process takes the form that it does. I present nine “laws of data mining”, useful maxims for data miners, with explanations that reveal the reasons behind the surface properties of the data mining process. The nine laws have implications for predictive analytics applications: how and why it works so well, which ambitions could succeed, and which must fail.

 

Speaker: Tom Khabaza, khabaza.com

 

Top of this page ] [ Agenda overview ]


12:15pm-1:30pm

Lunch / Exhibits
Room: Albert Suites


1:30pm-2:25pm
Expert Panel: Kaboom! Predictive Analytics Hits the Mainstream

Predictive analytics has taken off, across industry sectors and across applications in marketing, fraud detection, credit scoring and beyond. Where exactly are we in the process of crossing the chasm toward pervasive deployment, and how can we ensure progress keeps up the pace and stays on target?

This expert panel will address:

  • How much of predictive analytics’ potential has been fully realized?
  • Where are the outstanding opportunities with greatest potential?
  • What are the greatest challenges faced by the industry in achieving wide scale adoption?
  • How are these challenges best overcome?

 

Panelist: John Elder, Ph.D., Elder Research, Inc.

Panelist: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Panelist: Udo Sglavo, Global Analytic Solutions Manager, SAS

Panel moderator: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World


2:25pm-2:30pm

Session Changeover


2:30pm-3:20pm
Crowdsourcing Data Mining
Case Study: University of Melbourne, Chessmetrics
Prediction Competitions: Far More Than Just a Bit of Fun

Data modelling competitions allow companies and researchers to post a problem and have it scrutinised by the world’s best data scientists. There are an infinite number of techniques that can be applied to any modelling task but it is impossible to know at the outset which will be most effective. By exposing the problem to a wide audience, competitions are a cost effective way to reach the frontier of what is possible from a given dataset. The power of competitions is neatly illustrated by the results of a recent bioinformatics competition hosted by Kaggle. It required participants to pick markers in HIV’s genetic sequence that coincide with changes in the severity of infection. Within a week and a half, the best entry had already outdone the best methods in the scientific literature. This presentation will cover how competitions typically work, some case studies and the types of business modelling challenges that the Kaggle platform can address.

Speaker: Anthony Goldbloom, Kaggle Pty Ltd

Top of this page ] [ Agenda overview ]


3:20pm-3:50pm

Breaks /Exhibits
Room: Albert Suites


3:50pm-4:40pm
Human Resources; e-Commerce
Case Study: Naukri.com, Jeevansathi.com
Increasing Marketing ROI and Efficiency of Candidate-Search with Predictive Analytics

InfoEdge, India’s largest and most profitable online firm with a bouquet of internet properties has been Google’s biggest customer in India. Our team used predictive modeling to double our profits across multiple fronts. For Naukri.com, India’s number 1 job portal, predictive models target jobseekers most relevant to the recruiter. Analytical insights provided a deeper understanding of recruiter behaviour and informed a redesign of this product’s recruiter search functionality. This session will describe how we did it, and also reveal how Jeevansathi.com, India’s 2nd-largest matrimony portal, targets the acquisition of consumers in the market for marriage.

 

Speaker: Suvomoy Sarkar, Chief Analytics Officer, HT Media & Info Edge India (parent company of the two companies above)

 

Top of this page ] [ Agenda overview ]


4:40pm-5:00pm
Closing Remarks

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


Wednesday November 17, 2010

Full-day Workshop
The Best and the Worst of Predictive Analytics:
Predictive Modeling Methods and Common Data Mining Mistakes

Click here for the detailed workshop description

  • Workshop starts at 9:00am
  • First AM Break from 10:00 – 10:15
  • Second AM Break from 11:15 – 11:30
  • Lunch from 12:30 – 1:15pm
  • First PM Break: 2:00 – 2:15
  • Second PM Break: 3:15 – 3:30
  • Workshop ends at 4:30pm

Speaker: John Elder, Ph.D., CEO and Founder, Elder Research, Inc.

 

Interview James Dixon Pentaho

Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?

James-

Pentaho has a full suite of BI software:

* ETL: Pentaho Data Integration

* Reporting: Pentaho Reporting for desktop and web-based reporting

* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

* Dashboards: CDF and Dashboard Designer

* Predictive Analytics: Weka

* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

We have all of the standard BI functionality.

The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

James- The coolest thing is that everything is pluggable:

* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

* Reporting: New content types and report objects can be added. New data sources can be added.

* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE

Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

James- We have all of the everyday, standard reporting features covered.

Over the years the old tools, like Crystal Reports, have become bloated and complicated.

We don’t aim to have 100% of their features, because we’d end us just as complicated.

The 80:20 rule applies here. 80% of the time people only use 20% of their features.

We aim for 80% feature parity, which should cover 95-99% of typical use cases.

Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

Any  R plans for Pentaho as well?

James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)

About-

As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.