PAW Videos

A message from Predictive Analytics World on  newly available videos. It has many free videos as well so you can check them out.

Predictive Analytics World March 2011 in San Francisco

Access PAW DC Session Videos Now

Predictive Analytics World is pleased to announce on-demand access to the videos of PAW Washington DC, October 2010, including over 30 sessions and keynotes that you may view at your convenience. Access this leading predictive analytics content online now:

View the PAW DC session videos online

Register by January 18th and receive $150 off the full 2-day conference program videos (enter code PAW150 at checkout)

Trial videos – view the following for no charge:

Select individual conference sessions, or recognize savings by registering for access to one or two full days of sessions. These on-demand videos deliver PAW DC right to your desk, covering hot topics and advanced methods such as:

Social data 

Text mining

Search marketing

Risk management

Survey analysis

Consumer privacy

Sales force optimization

Response & cross-sell

Recommender systems

Featuring experts such as:
Usama Fayyad, Ph.D.
CEO, Open Insights Former Chief Data Officer, Yahoo!

Andrew Pole
Sr Mgr, Media/DB Mktng
Target
View Keynote for Free

John F. Elder, Ph.D.
CEO and Founder
Elder Research

Bruno Aziza
Director, Worldwide Strategy Lead, BI
Microsoft

Eric Siegel, Ph.D.
Conference Chair
Predictive Analytics World

PAW DC videos feature over 25 speakers with case studies from leading enterprises such as: CIBC, CEB, Forrester, Macy’s, MetLife, Microsoft, Miles Kimball, Monster.com, Oracle, Paychex, SunTrust, Target, UPMC, Xerox, Yahoo!, YMCA, and more.

How video access works:

View Slides on the Left See & Hear Speaker in the Right Window

Sign up by January 18 for immediate video access and $150 discount


San Francisco
March 14-15, 2011
Washington DC
October, 2011
London
November, 2011
Contact Us

Produced by:

 

Session Gallery: Day 1 of 2

Viewing (17) Sessions of (31)

 

keynote.jpg
Add to Cart
Keynote: Five Ways Predictive Analytics Cuts Enterprise Risk  

Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.

Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.

In this keynote session, Dr. Eric Siegel reveals:

– Five ways predictive analytics evolves your enterprise to reduce risk

– Hidden sources of risk across operational functions

– What every business should learn from insurance companies

– How advancements have reversed the very meaning of fraud

– Why “man + machine” teams are greater than the sum of their parts for enterprise decision support

Length – 00:45:57 | Email to a Colleague

Price: $195

 

 

sponsor.jpg
Play video of session: Platinum Sponsor Presentation, Analytics: The Beauty of Diversity
Platinum Sponsor Presentation: Analytics – The Beauty of Diversity 

Anne H. Milley, Senior Director of Analytic Strategy, Worldwide Product Marketing, SAS

Analytics contributes to, and draws from, multiple disciplines. The unifying theme of “making the world a better place” is bred from diversity. For instance, the same methods used in econometrics might be used in market research, psychometrics and other disciplines. In a similar way, diverse paradigms are needed to best solve problems, reveal opportunities and make better decisions. This is why we evolve capabilities to formulate and solve a wide range of problems through multiple integrated languages and interfaces. Extending that, we have provided integration with other languages so that users can draw on the disciplines and paradigms needed to best practice their craft.

Length – 20:11 | Email to a Colleague

Free viewing enabled – no charge

 

gold sponsor.jpg
Play video of session: Gold Sponsor Presentation Predictive Analytics Accelerate Insight for Financial Services
Gold Sponsor Presentation: Predictive Analytics Accelerate Insight for Financial Services 

Finbarr Deely, Director of Business Development,ParAccel

Financial services organizations face immense hurdles in maintaining profitability and building competitive advantage. Financial services organizations must perform “what-if” scenario analysis, identify risks, and detect fraud patterns. The advanced analytic complexity required often makes such analysis slow and painful, if not impossible. This presentation outlines the analytic challenges facing these organizations and provides a clear path to providing the accelerated insight needed to perform in today’s complex business environment to reduce risk, stop fraud and increase profits. * The value of predictive analytics in Accelerating Insight * Financial Services Analytic Case Studies * Brief Overview of ParAccel Analytic Database

Length – 09:06 | Email to a Colleague

Free viewing enabled – no charge

 

isson1.jpg
Add to Cart
TOPIC: BUSINESS VALUE
Case Study: Monster.com
Creating Global Competitive Power with Predictive Analytics 

Jean Paul Isson, Vice President, Globab BI & Predictive Analytics, Monster Worldwide

Using Predictive analytics to gain a deeper understanding of customer behaviours, increase marketing ROI and drive growth

– Creating global competitive power with business intelligence: Making the right decisions – at the right time

– Avoiding common change management challenges in sales, marketing, customer service, and products

– Developing a BI vision – and implementing it: successful business intelligence implementation models

– Using predictive analytics as a business driver to stay on top of the competition

– Following the Monster Worldwide global BI evolution: How Monster used BI to go from good to great

Length – 51:17 | Email to a Colleague

Price: $195

 

 

abbot.jpg
Add to Cart
TOPIC: SURVEY ANALYSIS
Case Study: YMCA
Turning Member Satisfaction Surveys into an Actionable Narrative 

Dean Abbott, President, Abbott Analytics

Employees are a key constituency at the Y and previous analysis has shown that their attitudes have a direct bearing on Member Satisfaction. This session will describe a successful approach for the analysis of YMCA employee surveys. Decision trees are built and examined in depth to identify key questions in describing key employee satisfaction metrics, including several interesting groupings of employee attitudes. Our approach will be contrasted with other factor analysis and regression-based approaches to survey analysis that we used initially. The predictive models described are currently in use and resulted in both greater understanding of employee attitudes, and a revised “short-form” survey with fewer key questions identified by the decision trees as the most important predictors.

Length – 50:19 | Email to a Colleague

Price: $195

 

 

rexer.jpg
Add to Cart
TOPIC: INDUSTRY TRENDS
2010 Data Minter Survey Results: Highlights
 

Karl Rexer, Ph.D., Rexer Analytics

Do you want to know the views, actions, and opinions of the data mining community? Each year, Rexer Analytics conducts a global survey of data miners to find out. This year at PAW we unveil the results of our 4th Annual Data Miner Survey. This session will present the research highlights, such as:

– Analytic goals & key challenges

– Impact of the economy

– Regional differences

– Text mining trends

Length – 15:20 | Email to a Colleague

Price: $195

 

 

elder.jpg
Add to Cart
Multiple Case Studies: U.S. DoD, U.S. DHS, SSA
Text Mining: Lessons Learned 

John F. Elder, Chief Scientist, Elder Research, Inc.

Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.

In solving unstructured (text) analysis challenges, we found that principles from inductive modeling – learning relationships from labeled cases – has great power to enhance text mining. Dr. Elder highlights key technical breakthroughs discovered while working on projects for leading government agencies, including: Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.

– Prioritizing searches for the Dept. of Homeland Security

– Quick decisions for Social Security Admin. disability

– Document discovery for the Dept. of Defense

– Disease discovery for the Dept. of Homeland Security

– Risk profiling for the Dept. of Defense

Length – 48:58 | Email to a Colleague

Price: $195

 

 

target.jpg
Play video of session: Keynote: How Target Gets the Most out of Its Guest Data to Improve Marketing ROI
Keynote: How Target Gets the Most out of Its Guest Data to Improve Marketing ROI 

Andrew Pole, Senior Manager, Media and Database Marketing, Target

In this session, you’ll learn how Target leverages its own internal guest data to optimize its direct marketing – with the ultimate goal of enhancing our guests’ shopping experience and driving in-store and online performance. You will hear about what guest data is available at Target, how and where we collect it, and how it is used to improve the performance and relevance of direct marketing vehicles. Furthermore, we will discuss Target’s development and usage of guest segmentation, response modeling, and optimization as means to suppress poor performers from mailings, determine relevant product categories and services for online targeted content, and optimally assign receipt marketing offers to our guests when offer quantities are limited.

Length – 47:49 | Email to a Colleague

Free viewing enabled – no charge

 

analytics.jpg
Play video of session: Platinum Sponsor Presentation: Driving Analytics Into Decision Making
Platinum Sponsor Presentation: Driving Analytics Into Decision Making  

Jason Verlen, Director, SPSS Product Strategy & Management, IBM Software Group

Organizations looking to dramatically improve their business outcomes are turning to decision management, a convergence of technology and business processes that is used to streamline and predict the outcome of daily decision-making. IBM SPSS Decision Management technology provides the critical link between analytical insight and recommended actions. In this session you’ll learn how Decision Management software integrates analytics with business rules and business applications for front-line systems such as call center applications, insurance claim processing, and websites. See how you can improve every customer interaction, minimize operational risk, reduce fraud and optimize results.

Length – 17:29 | Email to a Colleague

Free viewing enabled – no charge

 

macy.jpg
Add to Cart
TOPIC: DATA INFRASTRUCTURE AND INTEGRATION
Case Study: Macy’s
The world is not flat (even though modeling software has to think it is) 

Paul Coleman, Director of Marketing Statistics, Macy’s Inc.

Software for statistical modeling generally use flat files, where each record represents a unique case with all its variables. In contrast most large databases are relational, where data are distributed among various normalized tables for efficient storage. Variable creation and model scoring engines are necessary to bridge data mining and storage needs. Development datasets taken from a sampled history require snapshot management. Scoring datasets are taken from the present timeframe and the entire available universe. Organizations, with significant data, must decide when to store or calculate necessary data and understand the consequences for their modeling program.

Length – 34:54 | Email to a Colleague

Price: $195

 

 

gwaltney.jpg
Add to Cart
TOPIC: CUSTOMER VALUE
Case Study: SunTrust
When One Model Will Not Solve the Problem – Using Multiple Models to Create One Solution 

Dudley Gwaltney, Group Vice President, Analytical Modeling, SunTrust Bank

In 2007, SunTrust Bank developed a series of models to identify clients likely to have large changes in deposit balances. The models include three basic binary and two linear regression models.

Based on the models, 15% of SunTrust clients were targeted as those most likely to have large balance changes. These clients accounted for 65% of the absolute balance change and 60% of the large balance change clients. The targeted clients are grouped into a portfolio and assigned to individual SunTrust Retail Branch. Since 2008, the portfolio generated a 2.6% increase in balances over control.

Using the SunTrust example, this presentation will focus on:

– Identifying situations requiring multiple models

– Determining what types of models are needed

– Combining the individual component models into one output

Length – 48:22 | Email to a Colleague

Price: $195

 

 

paychex1.jpg
Add to Cart
TOPIC: RESPONSE & CROSS-SELL
Case Study: Paychex
Staying One Step Ahead of the Competition – Development of a Predictive 401(k) Marketing and Sales Campaign 

Jason Fox, Information Systems and Portfolio Manager,Paychex

In-depth case study of Paychex, Inc. utilizing predictive modeling to turn the tides on competitive pressures within their own client base. Paychex, a leading provider of payroll and human resource solutions, will guide you through the development of a Predictive 401(k) Marketing and Sales model. Through the use of sophisticated data mining techniques and regression analysis the model derives the probability a client will add retirement services products with Paychex or with a competitor. Session will include roadblocks that could have ended development and ROI analysis. Speaker: Frank Fiorille, Director of Enterprise Risk Management, Paychex Speaker: Jason Fox, Risk Management Analyst, Paychex

Length – 26:29 | Email to a Colleague

Price: $195

 

 

ling.jpg
Add to Cart
TOPIC: SEGMENTATION
Practitioner: Canadian Imperial Bank of Commerce
Segmentation Do’s and Don’ts 

Daymond Ling, Senior Director, Modelling & Analytics,Canadian Imperial Bank of Commerce

The concept of Segmentation is well accepted in business and has withstood the test of time. Even with the advent of new artificial intelligence and machine learning methods, this old war horse still has its place and is alive and well. Like all analytical methods, when used correctly it can lead to enhanced market positioning and competitive advantage, while improper application can have severe negative consequences.

This session will explore what are the elements of success, and what are the worse practices that lead to failure. The relationship between segmentation and predictive modeling will also be discussed to clarify when it is appropriate to use one versus the other, and how to use them together synergistically.

Length – 45:57 | Email to a Colleague

Price: $195

 

 

kobelius1.jpg
Add to Cart
TOPIC: SOCIAL DATA
Thought Leadership
Social Network Analysis: Killer Application for Cloud Analytics
 

James Kobielus, Senior Analyst, Forrester Research

Social networks such as Twitter and Facebook are a potential goldmine of insights on what is truly going through customers´minds. Every company wants to know whether, how, how often, and by whom they´re being mentioned across the billowing new cloud of social media. Just as important, every company wants to influence those discussions in their favor, target new business, and harvest maximum revenue potential. In this session, Forrester analyst James Kobielus identifies fruitful applications of social network analysis in customer service, sales, marketing, and brand management. He presents a roadmap for enterprises to leverage their inline analytics initiatives and leverage high-performance data warehousing (DW) clouds and appliances in order to analyze shifting patterns of customer sentiment, influence, and propensity. Leveraging Forrester’s ongoing research in advanced analytics and customer relationship management, Kobielus will discuss industry trends, commercial modeling tools, and emerging best practices in social network analysis, which represents a game-changing new discipline in predictive analytics.

Length – 48:16 | Email to a Colleague

Price: $195

 

 

dogan.jpg
Add to Cart
TOPIC: HEALTHCARE – INTERNATIONAL TARGETING
Case Study: Life Line Screening
Taking CRM Global Through Predictive Analytics 

Ozgur Dogan,
VP, Quantitative Solutions Group, Merkle Inc

Trish Mathe,
Director of Database Marketing, Life Line Screening

While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK. Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc Speaker: Trish Mathe, Director of Database Marketing, Life Line Screening

Length – 40:12 | Email to a Colleague

Price: $195

 

 

sambamoorthi1.jpg
Add to Cart
TOPIC: SURVEY ANALYSIS
Case Study: Forrester
Making Survey Insights Addressable and Scalable – The Case Study of Forrester’s Technographics Benchmark Survey 

Nethra Sambamoorthi, Team Leader, Consumer Dynamics & Analytics, Global Consulting, Acxiom Corporation

Marketers use surveys to create enterprise wide applicable strategic insights to: (1) develop segmentation schemes, (2) summarize consumer behaviors and attitudes for the whole US population, and (3) use multiple surveys to draw unified views about their target audience. However, these insights are not directly addressable and scalable to the whole consumer universe which is very important when applying the power of survey intelligence to the one to one consumer marketing problems marketers routinely face. Acxiom partnered with Forrester Research, creating addressable and scalable applications of Forrester’s Technographics Survey and applied it successfully to a number of industries and applications.

Length – 39:23 | Email to a Colleague

Price: $195

 

 

zasadil.jpg
Add to Cart
TOPIC: HEALTHCARE
Case Study: UPMC Health Plan
A Predictive Model for Hospital Readmissions 

Scott Zasadil, Senior Scientist, UPMC Health Plan

Hospital readmissions are a significant component of our nation’s healthcare costs. Predicting who is likely to be readmitted is a challenging problem. Using a set of 123,951 hospital discharges spanning nearly three years, we developed a model that predicts an individual’s 30-day readmission should they incur a hospital admission. The model uses an ensemble of boosted decision trees and prior medical claims and captures 64% of all 30-day readmits with a true positive rate of over 27%. Moreover, many of the ‘false’ positives are simply delayed true positives. 53% of the predicted 30-day readmissions are readmitted within 180 days.

Length – 54:18 | Email to a Colleague

Price: $195

Multi State Models

Arc de Triomphe

A special issue of the Journal of Statistical Software has come out devoted to Multi State Models and Competing Risks. It is a must read for anyone with interest in Pharma Analytics or Survival Analysis- even if you dont know much R

Here is an extract from “mstate: An R Package for the Analysis ofCompeting Risks and Multi-State Models”

Multi-state models are a very useful tool to answer a wide range of questions in sur-vival analysis that cannot, or only in a more complicated way, be answered by classicalmodels. They are suitable for both biomedical and other applications in which time-to-event variables are analyzed. However, they are still not frequently applied. So far, animportant reason for this has been the lack of available software. To overcome this prob-lem, we have developed the mstate package in R for the analysis of multi-state models.The package covers all steps of the analysis of multi-state models, from model buildingand data preparation to estimation and graphical representation of the results. It canbe applied to non- and semi-parametric (Cox) models. The package is also suitable forcompeting risks models, as they are a special category of multi-state models.

 

—————————–

 

Issues for JSS Special Volume 38: Competing Risks and Multi-State Models

Special Issue about Competing Risks and Multi-State Models

Hein Putter
Vol. 38, Issue 1, Jan 2011
Submitted 2011-01-03, Accepted 2011-01-03

Analyzing Competing Risk Data Using the R timereg Package

Thomas H. Scheike, Mei-Jie Zhang
Vol. 38, Issue 2, Jan 2011
Submitted 2009-05-25, Accepted 2010-06-22

p3state.msm: Analyzing Survival Data from an Illness-Death Model

Luís Filipe Meira Machado, Javier Roca-Pardiñas
Vol. 38, Issue 3, Jan 2011
Submitted 2009-06-30, Accepted 2010-03-02

Empirical Transition Matrix of Multi-State Models: The etm Package

Arthur Allignol, Martin Schumacher, Jan Beyersmann
Vol. 38, Issue 4, Jan 2011
Submitted 2009-01-08, Accepted 2010-03-11

Lexis: An R Class for Epidemiological Studies with Long-Term Follow-Up

Martyn Plummer, Bendix Carstensen
Vol. 38, Issue 5, Jan 2011
Submitted 2010-02-09, Accepted 2010-09-16

Using Lexis Objects for Multi-State Models in R

Bendix Carstensen, Martyn Plummer
Vol. 38, Issue 6, Jan 2011
Submitted 2010-02-09, Accepted 2010-09-16

mstate: An R Package for the Analysis of Competing Risks and Multi-State Models

Liesbeth C. de Wreede, Marta Fiocco, Hein Putter
Vol. 38, Issue 7, Jan 2011
Submitted 2010-01-17, Accepted 2010-08-20

Multi-State Models for Panel Data: The msm Package for R

Christopher Jackson
Vol. 38, Issue 8, Jan 2011
Submitted 2009-07-21, Accepted 2010-08-18

_______________________________________________
JSS-Announce mailing list
JSS-Announce@lists.stat.ucla.edu
http://lists.stat.ucla.edu/mailman/listinfo/jss-announce

 

2011 Forecast-ying

Free twitter badge
Image via Wikipedia

I had recently asked some friends from my Twitter lists for their take on 2011, atleast 3 of them responded back with the answer, 1 said they were still on it, and 1 claimed a recent office event.

Anyways- I take note of the view of forecasting from

http://www.uiah.fi/projekti/metodi/190.htm

The most primitive method of forecasting is guessing. The result may be rated acceptable if the person making the guess is an expert in the matter.

Ajay- people will forecast in end 2010 and 2011. many of them will get forecasts wrong, some very wrong, but by Dec 2011 most of them would be writing forecasts on 2012. almost no one will get called on by irate users-readers- (hey you got 4 out of 7 wrong last years forecast!) just wont happen. people thrive on hope. so does marketing. in 2011- and before

and some forecasts from Tom Davenport’s The International Institute for Analytics (IIA) at

http://iianalytics.com/2010/12/2011-predictions-for-the-analytics-industry/

Regulatory and privacy constraints will continue to hamper growth of marketing analytics.

(I wonder how privacy and analytics can co exist in peace forever- one view is that model building can use anonymized data suppose your IP address was anonymized using a standard secret Coco-Cola formula- then whatever model does get built would not be of concern to you individually as your privacy is protected by the anonymization formula)

Anyway- back to the question I asked-

What are the top 5 events in your industry (events as in things that occured not conferences) and what are the top 3 trends in 2011.

I define my industry as being online technology writing- research (with a heavy skew on stat computing)

My top 5 events for 2010 were-

1) Consolidation- Big 5 software providers in BI and Analytics bought more, sued more, and consolidated more.  The valuations rose. and rose. leading to even more smaller players entering. Thus consolidation proved an oxy moron as total number of influential AND disruptive players grew.

 

2) Cloudy Computing- Computing shifted from the desktop but to the mobile and more to the tablet than to the cloud. Ipad front end with Amazon Ec2 backend- yup it happened.

3) Open Source grew louder- yes it got more clients. and more revenue. did it get more market share. depends on if you define market share by revenues or by users.

Both Open Source and Closed Source had a good year- the pie grew faster and bigger so no one minded as long their slices grew bigger.

4) We didnt see that coming –

Technology continued to surprise with events (thats what we love! the surprises)

Revolution Analytics broke through R’s Big Data Barrier, Tableau Software created a big Buzz,  Wikileaks and Chinese FireWalls gave technology an entire new dimension (though not universally popular one).

people fought wars on emails and servers and social media- unfortunately the ones fighting real wars in 2009 continued to fight them in 2010 too

5) Money-

SAP,SAS,IBM,Oracle,Google,Microsoft made more money than ever before. Only Facebook got a movie named on itself. Venture Capitalists pumped in money in promising startups- really as if in a hurry to park money before tax cuts expired in some countries.

 

2011 Top Three Forecasts

1) Surprises- Expect to get surprised atleast 10 % of the time in business events. As internet grows the communication cycle shortens, the hype cycle amplifies buzz-

more unstructured data  is created (esp for marketing analytics) leading to enhanced volatility

2) Growth- Yes we predict technology will grow faster than the automobile industry. Game changers may happen in the form of Chrome OS- really its Linux guys-and customer adaptability to new USER INTERFACES. Design will matter much more in technology on your phone, on your desktop and on your internet. Packaging sells.

False Top Trend 3) I will write a book on business analytics in 2011. yes it is true and I am working with A publisher. No it is not really going to be a top 3 event for anyone except me,publisher and lucky guys who read it.

3) Creating technology and technically enabling creativity will converge at an accelerated rate. use of widgets, guis, snippets, ide will ensure creative left brains can code easier. and right brains can design faster and better due to a global supply chain of techie and artsy professionals.

 

 

How to Analyze Wikileaks Data – R SPARQL

Logo for R
Image via Wikipedia

Drew Conway- one of the very very few Project R voices I used to respect until recently. declared on his blog http://www.drewconway.com/zia/

Why I Will Not Analyze The New WikiLeaks Data

and followed it up with how HE analyzed the post announcing the non-analysis.

“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”

Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)

https://code.google.com/p/r-sparql/

Overview

R is a programming language designed for statistics.

R Sparql allows you to run SPARQL Queries inside R and store it as a R data frame.

The main objective is to allow the integration of Ontologies with Statistics.

It requires Java and rJava installed.

Example (in R console):

> library(sparql)> data <- query("SPARQL query>","RDF file or remote SPARQL Endpoint")

and the data in a remote SPARQL  http://www.ckan.net/package/cablegate

SPARQL is an easy language to pick  up, but dammit I am not supposed to blog on my vacations.

http://code.google.com/p/r-sparql/wiki/GettingStarted

Getting Started

1. Installation

1.1 Make sure Java is installed and is the default JVM:

$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk$ sudo update-java-alternatives -s java-6-sun

1.2 Configure R to use the correct version of Java

$ sudo R CMD javareconf

1.3 Install the rJava library

$ R> install.packages("rJava")> q()

1.4 Download and install the sparql library

Download: http://code.google.com/p/r-sparql/downloads/list

$ R CMD INSTALL sparql-0.1-X.tar.gz

2. Executing a SPARQL query

2.1 Start R

#Load the librarylibrary(sparql)#Run the queryresult <- query("SELECT ... ", "http://...")#Print the resultprint(result)

3. Examples

3.1 The Query can be a string or a local file:

query("SELECT ?date ?number ?season WHERE {  ... }", "local-file.rdf")
query("my-query.rq", "local-file.rdf")

The package will detect if my-query.rq exists and will load it from the file.

3.3 The uri can be a file or an url (for remote queries):

query("SELECT ... ","local-file.db")
query("SELECT ... ","http://dbpedia.org/sparql")

3.4 Get some examples here: http://code.google.com/p/r-sparql/downloads/list

SPARQL Tutorial-

http://openjena.org/ARQ/Tutorial/index.html

Also read-

http://webr3.org/blog/linked-data/virtuoso-6-sparqlgeo-and-linked-data/

and from the favorite blog of Project R- Also known as NY Times

http://bits.blogs.nytimes.com/2010/11/15/sorting-through-the-government-data-explosion/?twt=nytimesbits

In May 2009, the Obama administration started putting raw 
government data on the Web. 
It started with 47 data sets. Today, there are more than
 270,000 government data sets, spanning every imaginable 
category from public health to foreign aid.

Interview Jamie Nunnelly NISS

An interview with Jamie Nunnelly, Communications Director of National Institute of Statistical Sciences

Ajay– What does NISS do? And What does SAMSI do?

Jamie– The National Institute of Statistical Sciences (NISS) was established in 1990 by the national statistics societies and the Research Triangle universities and organizations, with the mission to identify, catalyze and foster high-impact, cross-disciplinary and cross-sector research involving the statistical sciences.

NISS is dedicated to strengthening and serving the national statistics community, most notably by catalyzing community members’ participation in applied research driven by challenges facing government and industry. NISS also provides career development opportunities for statisticians and scientists, especially those in the formative stages of their careers.

The Institute identifies emerging issues to which members of the statistics community can make key contributions, and then catalyzes the right combinations of researchers from multiple disciplines and sectors to tackle each problem. More than 300 researchers from over 100 institutions have worked on our projects.

The Statistical and Applied Mathematical Sciences Institute (SAMSI) is a partnership of Duke University,  North Carolina State University, The University of North Carolina at Chapel Hill, and NISS in collaboration with the William Kenan Jr. Institute for Engineering, Technology and Science and is part of the Mathematical Sciences Institutes of the NSF.

SAMSI focuses on 1-2 programs of research interest in the statistical and/or applied mathematical area and visitors from around the world are involved with the programs and come from a variety of disciplines in addition to mathematics and statistics.

Many come to SAMSI to attend workshops, and also participate in working groups throughout the academic year. Many of the working groups communicate via WebEx so people can be involved with the research remotely. SAMSI also has a robust education and outreach program to help undergraduate and graduate students learn about cutting edge research in applied mathematics and statistics.

Ajay– What successes have you had in 2010- and what do you need to succeed in 2011. Whats planned for 2011 anyway

Jamie– NISS has had a very successful collaboration with the National Agricultural Statistical Service (NASS) over the past two years that was just renewed for the next two years. NISS & NASS had three teams consisting of a faculty researcher in statistics, a NASS researcher, a NISS mentor, a postdoctoral fellow and a graduate student working on statistical modeling and other areas of research for NASS.

NISS is also working on a syndromic surveillance project with Clemson University, Duke University, The University of Georgia, The University of South Carolina. The group is currently working with some hospitals to test out a model they have been developing to help predict disease outbreak.

SAMSI had a very successful year with two programs ending this past summer, which were the Stochastic Dynamics program and the Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change. Several papers were written and published and many presentations have been made at various conferences around the world regarding the work that was conducted as SAMSI last year.

Next year’s program is so big that the institute has decided to devote all it’s time and energy around it, which is uncertainty quantification. The opening workshop, in addition to the main methodological theme, will be broken down into three areas of interest under this broad umbrella of research: climate change, engineering and renewable energy, and geosciences.

Ajay– Describe your career in science and communication.

Jamie– I have been in communications since 1985, working for large Fortune 500 companies such as General Motors and Tropicana Products. I moved to the Research Triangle region of North Carolina after graduate school and got into economic development and science communications first working for the Research Triangle Regional Partnership in 1994.

From 1996-2005 I was the communications director for the Research Triangle Park, working for the Research Triangle Foundation of NC. I published a quarterly magazine called The Park Guide for awhile, then came to work for NISS and SAMSI in 2008.

I really enjoy working with the mathematicians and statisticians. I always joke that I am the least educated person working here and that is not far from the truth! I am honored to help get the message out about all of the important research that is conducted here each day that is helping to improve the lives of so many people out there.

Ajay– Research Triangle or Silicon Valley– Which is better for tech people and why? Your opinion

Jamie– Both the Silicon Valley and Research Triangle are great regions for tech people to locate, but of course, I have to be biased and choose Research Triangle!

Really any place in the world that you find many universities working together with businesses and government, you have an area that will grow and thrive, because the collaborations help all of us generate new ideas, many of which blossom into new businesses, or new endeavors of research.

The quality of life in places such as the Research Triangle is great because you have people from around the world moving to a place, each bringing his/her culture, food, and uniqueness to this place, and enriching everyone else as a result.

Two advantages the Research Triangle has over Silicon Valley are that the Research Triangle has a bigger diversity of industries, so when the telecommunications industry busted back in 2001-02, the region took a hit, but the biotechnology industry was still growing, so unemployment rose, but not to the extent that other areas might have experienced.

The latest recession has hit us all very hard, so even this strategy has not made us immune to having high unemployment, but the Research Triangle region has been pegged by experts to be one of the first regions to emerge out of the Great Recession.

The other advantage I think we have is that our cost of living is still much more reasonable than Silicon Valley. It’s still possible to get a nice sized home, some land and not break the bank!

Ajay– How do you manage an active online social media presence, your job and your family. How important is balance in professional life and when young professional should realize this?

Jamie– Balance is everything, isn’t it? When I leave the office, I turn off my iPhone and disconnect from Twitter/Facebook etc.

I know that is not recommended by some folks, but I am a one person communications department and I love my family and friends and feel its important to devote time to them as well as to my career.

I think it is very important for young people to establish this early in their careers because if they don’t they will fall victim to working way too many hours and really, who loves you at the end of the day?

Your company may appreciate all you do for them, but if you leave, or you get sick and cannot work for them, you will be replaced

. Lee Iacocca, former CEO of Chrystler, said, “No matter what you’ve done for yourself or for humanity, if you can’t look back on having given love and attention to your own family, what have you really accomplished?” I think that is what is really most important in life.

About-

Jamie Nunnelly has been in communications for 25 years. She is currently on the board of directors for Chatham County Economic Development Corporation and Leadership Triangle & is a member of the International Association of Business Communicators and the Public Relations Society of America. She earned a bachelor’s degree in interpersonal and public communications at Bowling Green State University and a master’s degree in mass communications at the University of South Florida.

You can contact Jamie at http://niss.org/content/jamie-nunnelly or on twitter at

Summer School on Uncertainty Quantification

Scheme for sensitivity analysis
Image via Wikipedia

SAMSI/Sandia Summer School on Uncertainty Quantification – June 20-24, 2011

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

The utilization of computer models for complex real-world processes requires addressing Uncertainty Quantification (UQ). Corresponding issues range from inaccuracies in the models to uncertainty in the parameters or intrinsic stochastic features.

This Summer school will expose students in the mathematical and statistical sciences to common challenges in developing, evaluating and using complex computer models of processes. It is essential that the next generation of researchers be trained on these fundamental issues too often absent of traditional curricula.

Participants will receive not only an overview of the fast developing field of UQ but also specific skills related to data assimilation, sensitivity analysis and the statistical analysis of rare events.

Theoretical concepts and methods will be illustrated on concrete examples and applications from both nuclear engineering and climate modeling.

The main lecturers are:
Dan Cacuci (N.C. State University): data assimilation and applications to nuclear engineering

Dan Cooley (Colorado State University): statistical analysis of rare events
This short course will introduce the current statistical practice for analyzing extreme events. Statistical practice relies on fitting distributions suggested by asymptotic theory to a subset of data considered to be extreme. Both block maximum and threshold exceedance approaches will be presented for both the univariate and multivariate cases.

Doug Nychka (NCAR): data assimilation and applications in climate modeling
Climate prediction and modeling do not incorporate geophysical data in the sequential manner as weather forecasting and comparison to data is typically based on accumulated statistics, such as averages. This arises because a climate model matches the state of the Earth’s atmosphere and ocean “on the average” and so one would not expect the detailed weather fluctuations to be similar between a model and the real system. An emerging area for climate model validation and improvement is the use of data assimilation to scrutinize the physical processes in a model using observations on shorter time scales. The idea is to find a match between the state of the climate model and observed data that is particular to the observed weather. In this way one can check whether short time physical processes such as cloud formation or dynamics of the atmosphere are consistent with what is observed.

Dongbin Xiu (Purdue University): sensitivity analysis and polynomial chaos for differential equations
This lecture will focus on numerical algorithms for stochastic simulations, with an emphasis on the methods based on generalized polynomial chaos methodology. Both the mathematical framework and the technical details will be examined, along with performance comparisons and implementation issues for practical complex systems.

The main lectures will be supplemented by discussion sessions and by presentations from UQ practitioners from both the Sandia and Los Alamos National Laboratories.

http://www.samsi.info/workshop/samsisandia-summer-school-uncertainty-quantification

PAWCON -This week in London

Watch out for the twitter hash news on PAWCON and the exciting agenda lined up. If your in the City- you may want to just drop in

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Disclaimer- PAWCON has been a blog partner with Decisionstats (since the first PAWCON ). It is vendor neutral and features open source as well proprietary software, as well case studies from academia and Industry for a balanced view.

 

Little birdie told me some exciting product enhancements may be in the works including a not yet announced R plugin 😉 and the latest SAS product using embedded analytics and Dr Elder’s full day data mining workshop.

Citation-

http://www.predictiveanalyticsworld.com/london/2010/agenda.php#day1-7

Monday November 15, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:50am

Keynote
Five Ways Predictive Analytics Cuts Enterprise Risk

All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.

Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.

In this keynote session, Dr. Eric Siegel will reveal:

  • Five ways predictive analytics evolves your enterprise to reduce risk
  • Hidden sources of risk across operational functions
  • What every business should learn from insurance companies
  • How advancements have reversed the very meaning of fraud
  • Why “man + machine” teams are greater than the sum of their parts for
  • enterprise decision support

 

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


IBM9:50am-10:10am

Platinum Sponsor Presentation
The Analytical Revolution

The algorithms at the heart of predictive analytics have been around for years – in some cases for decades. But now, as we see predictive analytics move to the mainstream and become a competitive necessity for organisations in all industries, the most crucial challenges are to ensure that results can be delivered to where they can make a direct impact on outcomes and business performance, and that the application of analytics can be scaled to the most demanding enterprise requirements.

This session will look at the obstacles to successfully applying analysis at the enterprise level, and how today’s approaches and technologies can enable the true “industrialisation” of predictive analytics.

Speaker: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Top of this page ] [ Agenda overview ]


Deloitte10:10am-10:20am

Gold Sponsor Presentation
How Predictive Analytics is Driving Business Value

Organisations are increasingly relying on analytics to make key business decisions. Today, technology advances and the increasing need to realise competitive advantage in the market place are driving predictive analytics from the domain of marketers and tactical one-off exercises to the point where analytics are being embedded within core business processes.

During this session, Richard will share some of the focus areas where Deloitte is driving business transformation through predictive analytics, including Workforce, Brand Equity and Reputational Risk, Customer Insight and Network Analytics.

Speaker: Richard Fayers, Senior Manager, Deloitte Analytical Insight

Top of this page ] [ Agenda overview ]


10:20am-10:45am

Break / Exhibits
Room: Albert Suites


10:45am-11:35am
Healthcare
Case Study: Life Line Screening
Taking CRM Global Through Predictive Analytics

While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK.

Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc.

Speaker: Trish Mathe, Life Line Screening

Top of this page ] [ Agenda overview ]


11:35am-12:25pm
Open Source Analytics; Healthcare
Case Study: A large health care organization
The Rise of Open Source Analytics: Lowering Costs While Improving Patient Care

Rapidminer and R were the number 1 and 2 in this years annual KDNuggets data mining tool usage poll, followed by Knime on place 4 and Weka on place 6. So what’s going on here? Are these open source tools really that good or is their popularity strongly correlated with lower acquisition costs alone? This session answers these questions based on a real world case for a large health care organization and explains the risks & benefits of using open source technology. The final part of the session explains how these tools stack up against their traditional, proprietary counterparts.

Speaker: Jos van Dongen, Associate & Principal, DeltIQ Group

Top of this page ] [ Agenda overview ]


12:25pm-1:25pm

Lunch / Exhibits
Room: Albert Suites


1:25pm-2:15pm
Keynote
Thought Leader:
Case Study: Yahoo! and other large on-line e-businesses
Search Marketing and Predictive Analytics: SEM, SEO and On-line Marketing Case Studies

Search Engine Marketing is a $15B industry in the U.S. growing to double that number over the next 3 years. Worldwide the SEM market was over $50B in 2010. Not only is this a fast growing area of marketing, but it is one that has significant implications for brand and direct marketing and is undergoing rapid change with emerging channels such as mobile and social. What is unique about this area of marketing is a singularly heavy dependence on analytics:

 

  • Large numbers of variables and options
  • Real-time auctions/bids and a need to adjust strategies in real-time
  • Difficult optimization problems on allocating spend across a huge number of keywords
  • Fast-changing competitive terrain and heavy competition on the obvious channels
  • Complicated interactions between various channels and a large choice of search keyword expansion possibilities
  • Profitability and ROI analysis that are complex and often challenging

 

The size of the industry, its growing importance in marketing, its upcoming role in Mobile Advertising, and its uniquely heavy reliance on analytics makes it particularly interesting as an area for predictive analytics applications. In this session, not only will hear about some of the latest strategies and techniques to optimize search, you will hear case studies that illustrate the important role of analytics from industry practitioners.

Speaker: Usama Fayyad, , Ph.D., CEO, Open Insights

Top of this page ] [ Agenda overview ]


SAS2:15pm-2:35pm

Platinum Sponsor Presentation
Creating a Model Factory Using in-Database Analytics

With the ever-increasing number of analytical models required to make fact-based decisions, as well as increasing audit compliance regulations, it is more important than ever that these models can be created, monitored, retuned and deployed as quickly and automatically as possible. This paper, using a case study from a major financial organisation, will show how organisations can build a model factory efficiently using the latest SAS technology that utilizes the power of in-database processing.

Speaker: John Spooner, Analytics Specialist, SAS (UK)

Top of this page ] [ Agenda overview ]


2:35pm-2:45pm

Session Break
Room: Albert Suites


2:45pm-3:35pm

Retail
Case Study: SABMiller
Predictive Analytics & Global Marketing Strategy

Over the last few years SABMiller plc, the second largest brewing company in the world operating in 70 countries, has been systematically segmenting its markets in different countries globally in order optimize their portfolio strategy & align it to their long term country specific growth strategy. This presentation talks about the overall methodology followed and the challenges that had to be overcome both from a technical as well as from a change management stand point in order to successfully implement a standard analytics approach to diverse markets and diverse business positions in a highly global setting.

The session explains how country specific growth strategies were converted to objective variables and consumption occasion segments were created that differentiated the market effectively by their growth potential. In addition to this the presentation will also provide a discussion on issues like:

  • The dilemmas of static vs. dynamic solutions and standardization vs. adaptable solutions
  • Challenges in acceptability, local capability development, overcoming implementation inertia, cost effectiveness, etc
  • The role that business partners at SAB and analytics service partners at AbsolutData together play in providing impactful and actionable solutions

 

Speaker: Anne Stephens, SABMiller plc

Speaker: Titir Pal, AbsolutData

Top of this page ] [ Agenda overview ]


3:35pm-4:25pm

Retail
Case Study: Overtoom Belgium
Increasing Marketing Relevance Through Personalized Targeting

 

Since many years, Overtoom Belgium – a leading B2B retailer and division of the French Manutan group – focuses on an extensive use of CRM. In this presentation, we demonstrate how Overtoom has integrated Predictive Analytics to optimize customer relationships. In this process, they employ analytics to develop answers to the key question: “which product should we offer to which customer via which channel”. We show how Overtoom gained a 10% revenue increase by replacing the existing segmentation scheme with accurate predictive response models. Additionally, we illustrate how Overtoom succeeds to deliver more relevant communications by offering personalized promotional content to every single customer, and how these personalized offers positively impact Overtoom’s conversion rates.

Speaker: Dr. Geert Verstraeten, Python Predictions

Top of this page ] [ Agenda overview ]


4:25pm-4:50pm

Break / Exhibits
Room: Albert Suites


4:50pm-5:40pm
Uplift Modelling:
Case Study: Lloyds TSB General Insurance & US Bank
Uplift Modelling: You Should Not Only Measure But Model Incremental Response

Most marketing analysts understand that measuring the impact of a marketing campaign requires a valid control group so that uplift (incremental response) can be reported. However, it is much less widely understood that the targeting models used almost everywhere do not attempt to optimize that incremental measure. That requires an uplift model.

This session will explain why a switch to uplift modelling is needed, illustrate what can and does go wrong when they are not used and the hugely positive impact they can have when used effectively. It will also discuss a range of approaches to building and assessing uplift models, from simple basic adjustments to existing modelling processes through to full-blown uplift modelling.

The talk will use Lloyds TSB General Insurance & US Bank as a case study and also illustrate real-world results from other companies and sectors.

 

Speaker: Nicholas Radcliffe, Founder and Director, Stochastic Solutions

Top of this page ] [ Agenda overview ]


5:40pm-6:30pm

Consumer services
Case Study: Canadian Automobile Association and other B2C examples
The Diminishing Marginal Returns of Variable Creation in Predictive Analytics Solutions

 

Variable Creation is the key to success in any predictive analytics exercise. Many different approaches are adopted during this process, yet there are diminishing marginal returns as the number of variables increase. Our organization conducted a case study on four existing clients to explore this so-called diminishing impact of variable creation on predictive analytics solutions. Existing predictive analytics solutions were built using our traditional variable creation process. Yet, presuming that we could exponentially increase the number of variables, we wanted to determine if this added significant benefit to the existing solution.

Speaker: Richard Boire, BoireFillerGroup

Top of this page ] [ Agenda overview ]


6:30pm-7:30pm

Reception / Exhibits
Room: Albert Suites


Tuesday November 16, 2010
All conference sessions take place in Edward 5-7

8:00am-9:00am

Registration, Coffee and Danish
Room: Albert Suites


9:00am-9:55am
Keynote
Multiple Case Studies: Anheuser-Busch, Disney, HP, HSBC, Pfizer, and others
The High ROI of Data Mining for Innovative Organizations

Data mining and advanced analytics can enhance your bottom line in three basic ways, by 1) streamlining a process, 2) eliminating the bad, or 3) highlighting the good. In rare situations, a fourth way – creating something new – is possible. But modern organizations are so effective at their core tasks that data mining usually results in an iterative, rather than transformative, improvement. Still, the impact can be dramatic.

Dr. Elder will share the story (problem, solution, and effect) of nine projects conducted over the last decade for some of America’s most innovative agencies and corporations:

    Streamline:

  • Cross-selling for HSBC
  • Image recognition for Anheuser-Busch
  • Biometric identification for Lumidigm (for Disney)
  • Optimal decisioning for Peregrine Systems (now part of Hewlett-Packard)
  • Quick decisions for the Social Security Administration
    Eliminate Bad:

  • Tax fraud detection for the IRS
  • Warranty Fraud detection for Hewlett-Packard
    Highlight Good:

  • Sector trading for WestWind Foundation
  • Drug efficacy discovery for Pharmacia & UpJohn (now Pfizer)

Moderator: Eric Siegel, Program Chair, Predictive Analytics World

Speaker: John Elder, Ph.D., Elder Research, Inc.

Also see Dr. Elder’s full-day workshop

 

Top of this page ] [ Agenda overview ]


9:55am-10:30am

Break / Exhibits
Room: Albert Suites


10:30am-11:20am
Telecommunications
Case Study: Leading Telecommunications Operator
Predictive Analytics and Efficient Fact-based Marketing

The presentation describes what are the major topics and issues when you introduce predictive analytics and how to build a Fact-Based marketing environment. The introduced tools and methodologies proved to be highly efficient in terms of improving the overall direct marketing activity and customer contact operations for the involved companies. Generally, the introduced approaches have great potential for organizations with large customer bases like Mobile Operators, Internet Giants, Media Companies, or Retail Chains.

Main Introduced Solutions:-Automated Serial Production of Predictive Models for Campaign Targeting-Automated Campaign Measurements and Tracking Solutions-Precise Product Added Value Evaluation.

Speaker: Tamer Keshi, Ph.D., Long-term contractor, T-Mobile

Speaker: Beata Kovacs, International Head of CRM Solutions, Deutsche Telekom

Top of this page ] [ Agenda overview ]


11:20am-11:25am

Session Changeover


11:25am-12:15pm
Thought Leader
Nine Laws of Data Mining

Data mining is the predictive core of predictive analytics, a business process that finds useful patterns in data through the use of business knowledge. The industry standard CRISP-DM methodology describes the process, but does not explain why the process takes the form that it does. I present nine “laws of data mining”, useful maxims for data miners, with explanations that reveal the reasons behind the surface properties of the data mining process. The nine laws have implications for predictive analytics applications: how and why it works so well, which ambitions could succeed, and which must fail.

 

Speaker: Tom Khabaza, khabaza.com

 

Top of this page ] [ Agenda overview ]


12:15pm-1:30pm

Lunch / Exhibits
Room: Albert Suites


1:30pm-2:25pm
Expert Panel: Kaboom! Predictive Analytics Hits the Mainstream

Predictive analytics has taken off, across industry sectors and across applications in marketing, fraud detection, credit scoring and beyond. Where exactly are we in the process of crossing the chasm toward pervasive deployment, and how can we ensure progress keeps up the pace and stays on target?

This expert panel will address:

  • How much of predictive analytics’ potential has been fully realized?
  • Where are the outstanding opportunities with greatest potential?
  • What are the greatest challenges faced by the industry in achieving wide scale adoption?
  • How are these challenges best overcome?

 

Panelist: John Elder, Ph.D., Elder Research, Inc.

Panelist: Colin Shearer, WW Industry Solutions Leader, IBM UK Ltd

Panelist: Udo Sglavo, Global Analytic Solutions Manager, SAS

Panel moderator: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World


2:25pm-2:30pm

Session Changeover


2:30pm-3:20pm
Crowdsourcing Data Mining
Case Study: University of Melbourne, Chessmetrics
Prediction Competitions: Far More Than Just a Bit of Fun

Data modelling competitions allow companies and researchers to post a problem and have it scrutinised by the world’s best data scientists. There are an infinite number of techniques that can be applied to any modelling task but it is impossible to know at the outset which will be most effective. By exposing the problem to a wide audience, competitions are a cost effective way to reach the frontier of what is possible from a given dataset. The power of competitions is neatly illustrated by the results of a recent bioinformatics competition hosted by Kaggle. It required participants to pick markers in HIV’s genetic sequence that coincide with changes in the severity of infection. Within a week and a half, the best entry had already outdone the best methods in the scientific literature. This presentation will cover how competitions typically work, some case studies and the types of business modelling challenges that the Kaggle platform can address.

Speaker: Anthony Goldbloom, Kaggle Pty Ltd

Top of this page ] [ Agenda overview ]


3:20pm-3:50pm

Breaks /Exhibits
Room: Albert Suites


3:50pm-4:40pm
Human Resources; e-Commerce
Case Study: Naukri.com, Jeevansathi.com
Increasing Marketing ROI and Efficiency of Candidate-Search with Predictive Analytics

InfoEdge, India’s largest and most profitable online firm with a bouquet of internet properties has been Google’s biggest customer in India. Our team used predictive modeling to double our profits across multiple fronts. For Naukri.com, India’s number 1 job portal, predictive models target jobseekers most relevant to the recruiter. Analytical insights provided a deeper understanding of recruiter behaviour and informed a redesign of this product’s recruiter search functionality. This session will describe how we did it, and also reveal how Jeevansathi.com, India’s 2nd-largest matrimony portal, targets the acquisition of consumers in the market for marriage.

 

Speaker: Suvomoy Sarkar, Chief Analytics Officer, HT Media & Info Edge India (parent company of the two companies above)

 

Top of this page ] [ Agenda overview ]


4:40pm-5:00pm
Closing Remarks

Speaker: Eric Siegel, Ph.D., Program Chair, Predictive Analytics World

Top of this page ] [ Agenda overview ]


Wednesday November 17, 2010

Full-day Workshop
The Best and the Worst of Predictive Analytics:
Predictive Modeling Methods and Common Data Mining Mistakes

Click here for the detailed workshop description

  • Workshop starts at 9:00am
  • First AM Break from 10:00 – 10:15
  • Second AM Break from 11:15 – 11:30
  • Lunch from 12:30 – 1:15pm
  • First PM Break: 2:00 – 2:15
  • Second PM Break: 3:15 – 3:30
  • Workshop ends at 4:30pm

Speaker: John Elder, Ph.D., CEO and Founder, Elder Research, Inc.