Heritage offers 3 million chump change for Monkeys

My perspective is life is not fair, and if someone offers me 1 mill a year so they make 1 bill a year, I would still take it, especially if it leads to better human beings and better humanity on this planet. Health care isnt toothpaste.

Unless there are even more fine print changes involved- there exist several players in the pharma sector who do build and deploy models internally for denying claims or prospecting medical doctors with freebies, but they might just get caught with the new open data movement

————————————————————————————————–

A note from KDNuggets-

Heritage Health Prizereleased a second set of data on May 4. They also recently modified their ruleswhich now demand complete exclusivity and seem to disallow use of other tools (emphasis mine – Gregory PS)

21. LICENSE
By registering for the Competition, each Entrant (a) grants to Sponsor and its designees a worldwide, exclusive (except with respect to Entrant) , sub-licensable (through multiple tiers), transferable, fully paid-up, royalty-free, perpetual, irrevocable right to use, not use, reproduce, distribute (through multiple tiers), create derivative works of, publicly perform, publicly display, digitally perform, make, have made, sell, offer for sale and import the entry and the algorithm used to produce the entry, as well as any other algorithm, data or other information whatsoever developed or produced at any time using the data provided to Entrant in this Competition (collectively, the “Licensed Materials”), in any media now known or hereafter developed, for any purpose whatsoever, commercial or otherwise, without further approval by or payment to Entrant (the “License”) and
(b) represents that he/she/it has the unrestricted right to grant the License. 
Entrant understands and agrees that the License is exclusive except with respect to Entrant: Entrant may use the Licensed Materials solely for his/her/its own patient management and other internal business purposes but may not grant or otherwise transfer to any third party any rights to or interests in the Licensed Materials whatsoever.

This has lead to a call to boycott the competition by Tristan, who also notes that academics cannot publish their results without prior written approval of the Sponsor.

Anthony Goldbloom, CEO of Kaggle, emailed the HHP participants on May 4

HPN have asked me to pass on the following message: “The Heritage Provider Network is sponsoring the Heritage Health Prize to spur innovation and creative thinking in healthcare. HPN, however, is a medical group and must retain an exclusive license to the algorithms created using its data so as to ensure that the algorithms are used responsibly, and are only used to provide better health care to patients and not for improper purposes.
Put simply, while the competition hopes to spur innovation, this is not a competition regarding movie ratings or chess results. We hope that the clarifications we have made to the Rules and the FAQ adequately address your concerns and look forward to your participation in the competition.”

What do you think? Will the exclusive license prevent you from participating?

Heritage prize= 3mill now open

I am still angry with THE netflix for 1 mill I lost out. No sweat! this time the money is 3 times as much, it is legit, and yes baby you can change the world, make it a better place and get rich.! see details below-http://www.heritagehealthprize.com/c/hhp/Data

HERITAGE HEALTH PRIZE DATA FILES

You must accept this competition’s rules before you’ll be able to download data files.

IMPORTANT NOTE: The information provided below is intended only to provide general guidance to participants in the Heritage Health Prize Competition and is subject to the Competition Official Rules. Any capitalized term not defined below is defined in the Competition Official Rules. Please consult the Competition Official Rules for complete details.

Heritage Provider Network is providing Competition Entrants with deidentified member data collected during a forty-eight month period that is allocated among three data sets (the “Data Sets”). Competition Entrants will use the Data Sets to develop and test their algorithms for accurately predicting the number of days that the members will spend in a hospital (inpatient or emergency room visit) during the 12-month period following the Data Set cut-off date.

HHP_release2.zip contains the latest files, so you can ignore HHP_release1.zip. SampleEntry.CSV shows you how an entry should look.

Data Sets will be released to Entrants after registration on the Website according to the following schedule:

April 4, 2011 Claims Table – Y1 and DaysInHospital Table – Y2

May 4, 2011

All other Data Sets except Labs Table and Rx Table

From https://www.kaggle.com/

The $3 million Heritage Health Prize opens to entries

It’s been one month since the launch of the Heritage Health Prize. The prize has attracted some great publicity, receiving coverage from the Wall Street JournalThe EconomistSlate andForbes.

By now, people have had a good chance to poke around the first portion of the data. Now the fun starts! HPN have released two more years’-worth of data, set the accuracy threshold and are opening up the competition to entries. The data are available from the Heritage Health Prize page. Good luck to all participants!

The Deloitte/FIDE Chess Ratings Competition results

The Deloitte/FIDE Chess Ratings Competition attracted one of the strongest fields ever seen in a Kaggle Competition. The competition attracted 189 teams, ranging from chess ratings  experts to Netflix Prize winners. As Jeff Sonas wrote on the Kaggle blog last week, the  competition has far exceeded his expectations. A big congratulations the provisional winner, Tim Salimans, an econometrician at Erasmus University in Rotterdam. We look forward to reading about the approaches used by top performers on the Kaggle blog. We also look forward to the results of the FIDE prize, which could see the introduction of a new chess ratings system.

ICDAR 2011 Competition Results

The ICDAR 2011 competition also finished recently. The competiiton required participants to develop an algorithm that correctly matched handwriting samples. The winners were Lewis Griffin and Andrew Newell from the University College London who achieved Kaggle’s first ever perfect score by managing to match every sample correctly! Andrew and Lewis have posted a description of their winning method on the Kaggle blog.

Revolution R Enterprise

Since R is the most popular language used by Kaggle members, the Revolution Analytics team is making Revolution R Enterprise (the pre-eminent commercial version of R) available free of charge to Kaggle members. Revolution R Enterprise has several advantages over standard R, including the ability to seemlessly handle larger datasets. To get your free copy, visit http://info.revolutionanalytics.com/Kaggle.html.
Kaggle-in-Class

As many of you know, Kaggle offers a free platform, Kaggle-in-Class, for instructors who want to host competitions for their students. For those interested in hearing more about the use of Kaggle-in-Class as a teaching tool, Susan Holmes and Nelson Ray from Stanford University share their experience in a webinar organized by the Consortium for the Advancement of Undergraduate Statistics Education.

High Performance Analytics

Marry Big Data Analytics to High Performance Computing, and you get the buzzword of this season- High Performance Analytics.

It basically consists of Parallelized code to run in parallel on custom hardware, in -database analytics for speed, and cloud computing /high performance computing environments. On an operational level, it consists of software (as in analytics) partnering with software (as in databases, Map reduce, Hadoop) plus some hardware (HP or IBM mostly). It is considered a high margin , highly profitable, business with small number of deals compared to say desktop licenses.

As per HPC Wire- which is a great tool/newsletter to keep updated on HPC , SAS Institute has been busy on this front partnering with EMC Greenplum and TeraData (who also acquired  SAS Partner AsterData to gain a much needed foot in the MR/SQL space) Continue reading “High Performance Analytics”

Trying out Google Prediction API from R

Ubuntu Login
Image via Wikipedia

So I saw the news at NY R Meetup and decided to have a go at Prediction API Package (which first started off as a blog post at

http://onertipaday.blogspot.com/2010/11/r-wrapper-for-google-prediction-api.html

1)My OS was Ubuntu 10.10 Netbook

Ubuntu has a slight glitch plus workaround for installing the RCurl package on which the Google Prediction API is dependent- you need to first install this Ubuntu package for RCurl to install libcurl4-gnutls-dev

Once you install that using Synaptic,

Simply start R

2) Install Packages rjson and Rcurl using install.packages and choosing CRAN

Since GooglePredictionAPI is not yet on CRAN

,

3) Download that package from

https://code.google.com/p/google-prediction-api-r-client/downloads/detail?name=googlepredictionapi_0.1.tar.gz&can=2&q=

You need to copy this downloaded package to your “first library ” folder

When you start R, simply run

.libPaths()[1]

and thats the folder you copy the GooglePredictionAPI package  you downloaded.

5) Now the following line works

  1. Under R prompt,
  2. > install.packages("googlepredictionapi_0.1.tar.gz", repos=NULL, type="source")

6) Uploading data to Google Storage using the GUI (rather than gs util)

Just go to https://sandbox.google.com/storage/

and thats the Google Storage manager

Notes on Training Data-

Use a csv file

The first column is the score column (like 1,0 or prediction score)

There are no headers- so delete headers from data file and move the dependent variable to the first column  (Note I used data from the kaggle contest for R package recommendation at

http://kaggle.com/R?viewtype=data )

6) The good stuff:

Once you type in the basic syntax, the first time it will ask for your Google Credentials (email and password)

It then starts showing you time elapsed for training.

Now you can disconnect and go off (actually I got disconnected by accident before coming back in a say 5 minutes so this is the part where I think this is what happened is why it happened, dont blame me, test it for yourself) –

and when you come back (hopefully before token expires)  you can see status of your request (see below)

> library(rjson)
> library(RCurl)
Loading required package: bitops
> library(googlepredictionapi)
> my.model <- PredictionApiTrain(data="gs://numtraindata/training_data")
The request for training has sent, now trying to check if training is completed
Training on numtraindata/training_data: time:2.09 seconds
Training on numtraindata/training_data: time:7.00 seconds

7)

Note I changed the format from the URL where my data is located- simply go to your Google Storage Manager and right click on the file name for link address  ( https://sandbox.google.com/storage/numtraindata/training_data.csv)

to gs://numtraindata/training_data  (that kind of helps in any syntax error)

8) From the kind of high level instructions at  https://code.google.com/p/google-prediction-api-r-client/, you could also try this on a local file

Usage

## Load googlepredictionapi and dependent libraries
library(rjson)
library(RCurl)
library(googlepredictionapi)

## Make a training call to the Prediction API against data in the Google Storage.
## Replace MYBUCKET and MYDATA with your data.
my.model <- PredictionApiTrain(data="gs://MYBUCKET/MYDATA")

## Alternatively, make a training call against training data stored locally as a CSV file.
## Replace MYPATH and MYFILE with your data.
my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv")

At the time of writing my data was still getting trained, so I will keep you posted on what happens.

PAWCON -This week in London

Watch out for the twitter hash news on PAWCON and the exciting agenda lined up. If your in the City- you may want to just drop in

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Disclaimer- PAWCON has been a blog partner with Decisionstats (since the first PAWCON ). It is vendor neutral and features open source as well proprietary software, as well case studies from academia and Industry for a balanced view.

 

Little birdie told me some exciting product enhancements may be in the works including a not yet announced R plugin 😉 and the latest SAS product using embedded analytics and Dr Elder’s full day data mining workshop.

Citation-

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Monday November 15, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:50am

Keynote
Five Ways Predictive Analytics Cuts Enterprise Risk

All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.

Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.

In this keynote session, Dr. Eric Siegel will reveal:

  • Five ways predictive analytics evolves your enterprise to reduce risk
  • Hidden sources of risk across operational functions
  • What every business should learn from insurance companies
  • How advancements have reversed the very meaning of fraud
  • Why “man + machine” teams are greater than the sum of their parts for
  • enterprise decision support

 

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


IBM9:50am-10:10am

Platinum Sponsor Presentation
The Analytical Revolution

The algorithms at the heart of predictive analytics have been around for years – in some cases for decades. But now, as we see predictive analytics move to the mainstream and become a competitive necessity for organisations in all industries, the most crucial challenges are to ensure that results can be delivered to where they can make a direct impact on outcomes and business performance, and that the application of analytics can be scaled to the most demanding enterprise requirements.

This session will look at the obstacles to successfully applying analysis at the enterprise level, and how today’s approaches and technologies can enable the true “industrialisation” of predictive analytics.

Speaker: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Top of this page ] [ Agenda overview ]


Deloitte10:10am-10:20am

Gold Sponsor Presentation
How Predictive Analytics is Driving Business Value

Organisations are increasingly relying on analytics to make key business decisions. Today, technology advances and the increasing need to realise competitive advantage in the market place are driving predictive analytics from the domain of marketers and tactical one-off exercises to the point where analytics are being embedded within core business processes.

During this session, Richard will share some of the focus areas where Deloitte is driving business transformation through predictive analytics, including Workforce, Brand Equity and Reputational Risk, Customer Insight and Network Analytics.

Speaker: Richard Fayers, Senior Manager, Deloitte Analytical Insight

Top of this page ] [ Agenda overview ]


10:20am-10:45am

Break / Exhibits
Room: Albert Suites


10:45am-11:35am
Healthcare
Case Study: Life Line Screening
Taking CRM Global Through Predictive Analytics

While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK.

Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc.

Speaker: Trish Mathe, Life Line Screening

Top of this page ] [ Agenda overview ]


11:35am-12:25pm
Open Source Analytics; Healthcare
Case Study: A large health care organization
The Rise of Open Source Analytics: Lowering Costs While Improving Patient Care

Rapidminer and R were the number 1 and 2 in this years annual KDNuggets data mining tool usage poll, followed by Knime on place 4 and Weka on place 6. So what’s going on here? Are these open source tools really that good or is their popularity strongly correlated with lower acquisition costs alone? This session answers these questions based on a real world case for a large health care organization and explains the risks & benefits of using open source technology. The final part of the session explains how these tools stack up against their traditional, proprietary counterparts.

Speaker: Jos van Dongen, Associate & Principal, DeltIQ Group

Top of this page ] [ Agenda overview ]


12:25pm-1:25pm

Lunch / Exhibits
Room: Albert Suites


1:25pm-2:15pm
Keynote
Thought Leader:
Case Study: Yahoo! and other large on-line e-businesses
Search Marketing and Predictive Analytics: SEM, SEO and On-line Marketing Case Studies

Search Engine Marketing is a $15B industry in the U.S. growing to double that number over the next 3 years. Worldwide the SEM market was over $50B in 2010. Not only is this a fast growing area of marketing, but it is one that has significant implications for brand and direct marketing and is undergoing rapid change with emerging channels such as mobile and social. What is unique about this area of marketing is a singularly heavy dependence on analytics:

 

  • Large numbers of variables and options
  • Real-time auctions/bids and a need to adjust strategies in real-time
  • Difficult optimization problems on allocating spend across a huge number of keywords
  • Fast-changing competitive terrain and heavy competition on the obvious channels
  • Complicated interactions between various channels and a large choice of search keyword expansion possibilities
  • Profitability and ROI analysis that are complex and often challenging

 

The size of the industry, its growing importance in marketing, its upcoming role in Mobile Advertising, and its uniquely heavy reliance on analytics makes it particularly interesting as an area for predictive analytics applications. In this session, not only will hear about some of the latest strategies and techniques to optimize search, you will hear case studies that illustrate the important role of analytics from industry practitioners.

Speaker: Usama Fayyad, , Ph.D., CEO, Open Insights

Top of this page ] [ Agenda overview ]


SAS2:15pm-2:35pm

Platinum Sponsor Presentation
Creating a Model Factory Using in-Database Analytics

With the ever-increasing number of analytical models required to make fact-based decisions, as well as increasing audit compliance regulations, it is more important than ever that these models can be created, monitored, retuned and deployed as quickly and automatically as possible. This paper, using a case study from a major financial organisation, will show how organisations can build a model factory efficiently using the latest SAS technology that utilizes the power of in-database processing.

Speaker: John Spooner, Analytics Specialist, SAS (UK)

Top of this page ] [ Agenda overview ]


2:35pm-2:45pm

Session Break
Room: Albert Suites


2:45pm-3:35pm

Retail
Case Study: SABMiller
Predictive Analytics & Global Marketing Strategy

Over the last few years SABMiller plc, the second largest brewing company in the world operating in 70 countries, has been systematically segmenting its markets in different countries globally in order optimize their portfolio strategy & align it to their long term country specific growth strategy. This presentation talks about the overall methodology followed and the challenges that had to be overcome both from a technical as well as from a change management stand point in order to successfully implement a standard analytics approach to diverse markets and diverse business positions in a highly global setting.

The session explains how country specific growth strategies were converted to objective variables and consumption occasion segments were created that differentiated the market effectively by their growth potential. In addition to this the presentation will also provide a discussion on issues like:

  • The dilemmas of static vs. dynamic solutions and standardization vs. adaptable solutions
  • Challenges in acceptability, local capability development, overcoming implementation inertia, cost effectiveness, etc
  • The role that business partners at SAB and analytics service partners at AbsolutData together play in providing impactful and actionable solutions

 

Speaker: Anne Stephens, SABMiller plc

Speaker: Titir Pal, AbsolutData

Top of this page ] [ Agenda overview ]


3:35pm-4:25pm

Retail
Case Study: Overtoom Belgium
Increasing Marketing Relevance Through Personalized Targeting

 

Since many years, Overtoom Belgium – a leading B2B retailer and division of the French Manutan group – focuses on an extensive use of CRM. In this presentation, we demonstrate how Overtoom has integrated Predictive Analytics to optimize customer relationships. In this process, they employ analytics to develop answers to the key question: “which product should we offer to which customer via which channel”. We show how Overtoom gained a 10% revenue increase by replacing the existing segmentation scheme with accurate predictive response models. Additionally, we illustrate how Overtoom succeeds to deliver more relevant communications by offering personalized promotional content to every single customer, and how these personalized offers positively impact Overtoom’s conversion rates.

Speaker: Dr. Geert Verstraeten, Python Predictions

Top of this page ] [ Agenda overview ]


4:25pm-4:50pm

Break / Exhibits
Room: Albert Suites


4:50pm-5:40pm
Uplift Modelling:
Case Study: Lloyds TSB General Insurance & US Bank
Uplift Modelling: You Should Not Only Measure But Model Incremental Response

Most marketing analysts understand that measuring the impact of a marketing campaign requires a valid control group so that uplift (incremental response) can be reported. However, it is much less widely understood that the targeting models used almost everywhere do not attempt to optimize that incremental measure. That requires an uplift model.

This session will explain why a switch to uplift modelling is needed, illustrate what can and does go wrong when they are not used and the hugely positive impact they can have when used effectively. It will also discuss a range of approaches to building and assessing uplift models, from simple basic adjustments to existing modelling processes through to full-blown uplift modelling.

The talk will use Lloyds TSB General Insurance & US Bank as a case study and also illustrate real-world results from other companies and sectors.

 

Speaker: Nicholas Radcliffe, Founder and Director, Stochastic Solutions

Top of this page ] [ Agenda overview ]


5:40pm-6:30pm

Consumer services
Case Study: Canadian Automobile Association and other B2C examples
The Diminishing Marginal Returns of Variable Creation in Predictive Analytics Solutions

 

Variable Creation is the key to success in any predictive analytics exercise. Many different approaches are adopted during this process, yet there are diminishing marginal returns as the number of variables increase. Our organization conducted a case study on four existing clients to explore this so-called diminishing impact of variable creation on predictive analytics solutions. Existing predictive analytics solutions were built using our traditional variable creation process. Yet, presuming that we could exponentially increase the number of variables, we wanted to determine if this added significant benefit to the existing solution.

Speaker: Richard Boire, BoireFillerGroup

Top of this page ] [ Agenda overview ]


6:30pm-7:30pm

Reception / Exhibits
Room: Albert Suites


Tuesday November 16, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:55am
Keynote
Multiple Case Studies: Anheuser-Busch, Disney, HP, HSBC, Pfizer, and others
The High ROI of Data Mining for Innovative Organizations

Data mining and advanced analytics can enhance your bottom line in three basic ways, by 1) streamlining a process, 2) eliminating the bad, or 3) highlighting the good. In rare situations, a fourth way – creating something new – is possible. But modern organizations are so effective at their core tasks that data mining usually results in an iterative, rather than transformative, improvement. Still, the impact can be dramatic.

Dr. Elder will share the story (problem, solution, and effect) of nine projects conducted over the last decade for some of America’s most innovative agencies and corporations:

    Streamline:

  • Cross-selling for HSBC
  • Image recognition for Anheuser-Busch
  • Biometric identification for Lumidigm (for Disney)
  • Optimal decisioning for Peregrine Systems (now part of Hewlett-Packard)
  • Quick decisions for the Social Security Administration
    Eliminate Bad:

  • Tax fraud detection for the IRS
  • Warranty Fraud detection for Hewlett-Packard
    Highlight Good:

  • Sector trading for WestWind Foundation
  • Drug efficacy discovery for Pharmacia & UpJohn (now Pfizer)

Moderator: Eric Siegel, Program Chair, Predictive Analytics World

Speaker: John Elder, Ph.D., Elder Research, Inc.

Also see Dr. Elder’s full-day workshop

 

Top of this page ] [ Agenda overview ]


9:55am-10:30am

Break / Exhibits
Room: Albert Suites


10:30am-11:20am
Telecommunications
Case Study: Leading Telecommunications Operator
Predictive Analytics and Efficient Fact-based Marketing

The presentation describes what are the major topics and issues when you introduce predictive analytics and how to build a Fact-Based marketing environment. The introduced tools and methodologies proved to be highly efficient in terms of improving the overall direct marketing activity and customer contact operations for the involved companies. Generally, the introduced approaches have great potential for organizations with large customer bases like Mobile Operators, Internet Giants, Media Companies, or Retail Chains.

Main Introduced Solutions:-Automated Serial Production of Predictive Models for Campaign Targeting-Automated Campaign Measurements and Tracking Solutions-Precise Product Added Value Evaluation.

Speaker: Tamer Keshi, Ph.D., Long-term contractor, T-Mobile

Speaker: Beata Kovacs, International Head of CRM Solutions, Deutsche Telekom

Top of this page ] [ Agenda overview ]


11:20am-11:25am

Session Changeover


11:25am-12:15pm
Thought Leader
Nine Laws of Data Mining

Data mining is the predictive core of predictive analytics, a business process that finds useful patterns in data through the use of business knowledge. The industry standard CRISP-DM methodology describes the process, but does not explain why the process takes the form that it does. I present nine “laws of data mining”, useful maxims for data miners, with explanations that reveal the reasons behind the surface properties of the data mining process. The nine laws have implications for predictive analytics applications: how and why it works so well, which ambitions could succeed, and which must fail.

 

Speaker: Tom Khabaza, khabaza.com

 

Top of this page ] [ Agenda overview ]


12:15pm-1:30pm

Lunch / Exhibits
Room: Albert Suites


1:30pm-2:25pm
Expert Panel: Kaboom! Predictive Analytics Hits the Mainstream

Predictive analytics has taken off, across industry sectors and across applications in marketing, fraud detection, credit scoring and beyond. Where exactly are we in the process of crossing the chasm toward pervasive deployment, and how can we ensure progress keeps up the pace and stays on target?

This expert panel will address:

  • How much of predictive analytics’ potential has been fully realized?
  • Where are the outstanding opportunities with greatest potential?
  • What are the greatest challenges faced by the industry in achieving wide scale adoption?
  • How are these challenges best overcome?

 

Panelist: John Elder, Ph.D., Elder Research, Inc.

Panelist: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Panelist: Udo Sglavo, Global Analytic Solutions Manager, SAS

Panel moderator: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World


2:25pm-2:30pm

Session Changeover


2:30pm-3:20pm
Crowdsourcing Data Mining
Case Study: University of Melbourne, Chessmetrics
Prediction Competitions: Far More Than Just a Bit of Fun

Data modelling competitions allow companies and researchers to post a problem and have it scrutinised by the world’s best data scientists. There are an infinite number of techniques that can be applied to any modelling task but it is impossible to know at the outset which will be most effective. By exposing the problem to a wide audience, competitions are a cost effective way to reach the frontier of what is possible from a given dataset. The power of competitions is neatly illustrated by the results of a recent bioinformatics competition hosted by Kaggle. It required participants to pick markers in HIV’s genetic sequence that coincide with changes in the severity of infection. Within a week and a half, the best entry had already outdone the best methods in the scientific literature. This presentation will cover how competitions typically work, some case studies and the types of business modelling challenges that the Kaggle platform can address.

Speaker: Anthony Goldbloom, Kaggle Pty Ltd

Top of this page ] [ Agenda overview ]


3:20pm-3:50pm

Breaks /Exhibits
Room: Albert Suites


3:50pm-4:40pm
Human Resources; e-Commerce
Case Study: Naukri.com, Jeevansathi.com
Increasing Marketing ROI and Efficiency of Candidate-Search with Predictive Analytics

InfoEdge, India’s largest and most profitable online firm with a bouquet of internet properties has been Google’s biggest customer in India. Our team used predictive modeling to double our profits across multiple fronts. For Naukri.com, India’s number 1 job portal, predictive models target jobseekers most relevant to the recruiter. Analytical insights provided a deeper understanding of recruiter behaviour and informed a redesign of this product’s recruiter search functionality. This session will describe how we did it, and also reveal how Jeevansathi.com, India’s 2nd-largest matrimony portal, targets the acquisition of consumers in the market for marriage.

 

Speaker: Suvomoy Sarkar, Chief Analytics Officer, HT Media & Info Edge India (parent company of the two companies above)

 

Top of this page ] [ Agenda overview ]


4:40pm-5:00pm
Closing Remarks

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


Wednesday November 17, 2010

Full-day Workshop
The Best and the Worst of Predictive Analytics:
Predictive Modeling Methods and Common Data Mining Mistakes

Click here for the detailed workshop description

  • Workshop starts at 9:00am
  • First AM Break from 10:00 – 10:15
  • Second AM Break from 11:15 – 11:30
  • Lunch from 12:30 – 1:15pm
  • First PM Break: 2:00 – 2:15
  • Second PM Break: 3:15 – 3:30
  • Workshop ends at 4:30pm

Speaker: John Elder, Ph.D., CEO and Founder, Elder Research, Inc.

 

Data Visualization using Tableau

Image representing Tableau Software as depicte...
Image via CrunchBase

Here is a great piece of software for data visualization– the public version is free.

And you can use it for Desktop Analytics as well as BI /server versions at very low cost.

About Tableau Software

http://www.tableausoftware.com/press_release/tableau-massive-growth-hiring-q3-2010

Tableau was named by Software Magazine as the fastest growing software company in the $10 million to $30 million range in the world, and the second fastest growing software company worldwide overall. The ranking stems from the publication’s 28th annual Software 500 ranking of the world’s largest software service providers.

“We’re growing fast because the market is starving for easy-to-use products that deliver rapid-fire business intelligence to everyone. Our customers want ways to unlock their databases and produce engaging reports and dashboards,” said Christian Chabot CEO and co-founder of Tableau.

http://www.tableausoftware.com/about/who-we-are

History in the Making

Put together an Academy-Award winning professor from the nation’s most prestigious university, a savvy business leader with a passion for data, and a brilliant computer scientist. Add in one of the most challenging problems in software – making databases and spreadsheets understandable to ordinary people. You have just recreated the fundamental ingredients for Tableau.

The catalyst? A Department of Defense (DOD) project aimed at increasing people’s ability to analyze information and brought to famed Stanford professor, Pat Hanrahan. A founding member of Pixar and later its chief architect for RenderMan, Pat invented the technology that changed the world of animated film. If you know Buzz and Woody of “Toy Story”, you have Pat to thank.

Under Pat’s leadership, a team of Stanford Ph.D.s got together just down the hall from the Google folks. Pat and Chris Stolte, the brilliant computer scientist, realized that data visualization could produce large gains in people’s ability to understand information. Rather than analyzing data in text form and then creating visualizations of those findings, Pat and Chris invented a technology called VizQL™ by which visualization is part of the journey and not just the destination. Fast analytics and visualization for everyone was born.

While satisfying the DOD project, Pat and Chris met Christian Chabot, a former data analyst who turned into Jello when he saw what had been invented. The three formed a company and spun out of Stanford like so many before them (Yahoo, Google, VMWare, SUN). With Christian on board as CEO, Tableau rapidly hit one success after another: its first customer (now Tableau’s VP, Operations, Tom Walker), an OEM deal with Hyperion (now Oracle), funding from New Enterprise Associates, a PC Magazine award for “Product of the Year” just one year after launch, and now over 50,000 people in 50+ countries benefiting from the breakthrough.

also see http://www.tableausoftware.com/about/leadership

http://www.tableausoftware.com/about/board

—————————————————————————-

and now  a demo I ran on the Kaggle contest data (it is a csv dataset with 95000 rows)

I found Tableau works extremely good at pivoting data and visualizing it -almost like Excel on  Steroids. Download the free version here ( I dont know about an academic program (see links below) but software is not expensive at all)

http://buy.tableausoftware.com/

Desktop Personal Edition

The Personal Edition is a visual analysis and reporting solution for data stored in Excel, MS Access or Text Files. Available via download.

Product Information

$999*

Desktop Professional Edition

The Professional Edition is a visual analysis and reporting solution for data stored in MS SQL Server, MS Analysis Services, Oracle, IBM DB2, Netezza, Hyperion Essbase, Teradata, Vertica, MySQL, PostgreSQL, Firebird, Excel, MS Access or Text Files. Available via download.

Product Information

$1800*

Tableau Server

Tableau Server enables users of Tableau Desktop Professional to publish workbooks and visualizations to a server where users with web browsers can access and interact with the results. Available via download.

Product Information

Contact Us

* Price is per Named User and includes one year of maintenance (upgrades and support). Products are made available as a download immediately after purchase. You may revisit the download site at any time during your current maintenance period to access the latest releases.