An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts
Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?
If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.
For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.
Here’s how it works:
Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**
It’s that simple.
We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.
Sign up now, and we will be in touch shortly.
—————————-
SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.
*Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.
**Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.
Non Oracle Open Office completes important milestone- from the press release
The Document Foundation launches LibreOffice 3.3
The first stable release of the free office suite is available for download
The Internet, January 25, 2011 – The Document Foundation launches LibreOffice 3.3, the first stable release of the free office suite developed by the community. In less than four months, the number of developers hacking LibreOffice has grown from less than twenty in late September 2010, to well over one hundred today. This has allowed us to release ahead of the aggressive schedule set by the project.
Not only does it ship a number of new and original features, LibreOffice 3.3 is also a significant achievement for a number of reasons:
– the developer community has been able to build their own and independent process, and get up and running in a very short time (with respect to the size of the code base and the project’s strong ambitions);
– thanks to the high number of new contributors having been attracted into the project, the source code is quickly undergoing a major clean-up to provide a better foundation for future development of LibreOffice;
– the Windows installer, which is going to impact the largest and most diverse user base, has been integrated into a single build containing all language versions, thus reducing the size for download sites from 75 to 11GB, making it easier for us to deploy new versions more rapidly and lowering the carbon footprint of the entire infrastructure.
Caolán McNamara from RedHat, one of the developer community leaders, comments, “We are excited: this is our very first stable release, and therefore we are eager to get user feedback, which will be integrated as soon as possible into the code, with the first enhancements being released in February. Starting from March, we will be moving to a real time-based, predictable, transparent and public release schedule, in accordance with Engineering Steering Committee’s goals and users’ requests”. The LibreOffice development roadmap is available at http://wiki.documentfoundation.org/ReleasePlan
LibreOffice 3.3 brings several unique new features. The 10 most-popular among community members are, in no particular order:
the ability to import and work with SVG files;
an easy way to format title pages and their numbering in Writer;
a more-helpful Navigator Tool for Writer;
improved ergonomics in Calc for sheet and cell management;
and Microsoft Works and Lotus Word Pro document import filters.
In addition, many great extensions are now bundled, providing
LibreOffice 3.3 also provides all the new features of OpenOffice.org 3.3, such as new custom properties handling; embedding of standard PDF fonts in PDF documents; new Liberation Narrow font; increased document protection in Writer and Calc; auto decimal digits for “General” format in Calc; 1 million rows in a spreadsheet; new options for CSV import in Calc; insert drawing objects in Charts; hierarchical axis labels for Charts; improved slide layout handling in Impress; a new easier-to-use print interface; more options for changing case; and colored sheet tabs in Calc. Several of these new features were contributed by members of the LibreOffice team prior to the formation of The Document Foundation.
LibreOffice hackers will be meeting at FOSDEM in Brussels on February 5 and 6, and will be presenting their work during a one-day workshop on February 6, with speeches and hacking sessions coordinated by several members of the project.
The home of LibreOffice is at http://www.libreoffice.org where the download page has been redesigned by the community to be more user-friendly.
*** About The Document Foundation
The Document Foundation has the mission of facilitating the evolution of the OOo Community into a new, open, independent, and meritocratic organization within the next few months. An independent Foundation is a better reflection of the values of our contributors, users and supporters, and will enable a more effective, efficient and transparent community. TDF will protect past investments by building on the achievements of the first decade, will encourage wide participation within the community, and will co-ordinate activity across the community.
This promotional offer enables you to try a limited amount of the Windows Azure platform at no charge. The subscription includes a base level of monthly compute hours, storage, data transfers, a SQL Azure database, Access Control transactions and Service Bus connections at no charge. Please note that any usage over this introductory base level will be charged at standard rates.
Included each month at no charge:
Windows Azure
25 hours of a small compute instance
500 MB of storage
10,000 storage transactions
SQL Azure
1GB Web Edition database (available for first 3 months only)
Windows Azure platform AppFabric
100,000 Access Control transactions
2 Service Bus connections
Data Transfers (per region)
500 MB in
500 MB out
Any monthly usage in excess of the above amounts will be charged at the standard rates. This introductory special will end on March 31, 2011 and all usage will then be charged at the standard rates.
As part of AWS’s Free Usage Tier, new AWS customers can get started with Amazon EC2 for free. Upon sign-up, new AWScustomers receive the following EC2 services each month for one year:
750 hours of EC2 running Linux/Unix Micro instance usage
750 hours of Elastic Load Balancing plus 15 GB data processing
10 GB of Amazon Elastic Block Storage (EBS) plus 1 million IOs, 1 GB snapshot storage, 10,000 snapshot Get Requests and 1,000 snapshot Put Requests
15 GB of bandwidth in and 15 GB of bandwidth out aggregated across all AWS services
Paid Instances-
Standard On-Demand Instances
Linux/UNIX Usage
Windows Usage
Small (Default)
$0.085 per hour
$0.12 per hour
Large
$0.34 per hour
$0.48 per hour
Extra Large
$0.68 per hour
$0.96 per hour
Micro On-Demand Instances
Micro
$0.02 per hour
$0.03 per hour
High-Memory On-Demand Instances
Extra Large
$0.50 per hour
$0.62 per hour
Double Extra Large
$1.00 per hour
$1.24 per hour
Quadruple Extra Large
$2.00 per hour
$2.48 per hour
High-CPU On-Demand Instances
Medium
$0.17 per hour
$0.29 per hour
Extra Large
$0.68 per hour
$1.16 per hour
Cluster Compute Instances
Quadruple Extra Large
$1.60 per hour
N/A*
Cluster GPU Instances
Quadruple Extra Large
$2.10 per hour
N/A*
* Windows is not currently available for Cluster Compute or Cluster GPU Instances.
NOTE- Amazon Instance definitions differ slightly from Azure definitions
Instances of this family are well suited for most applications.
Small Instance – default*
1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small
Large Instance
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large
Extra Large Instance
15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge
Micro Instances
Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.
Micro Instance
613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro
High-Memory Instances
Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.
High-Memory Extra Large Instance
17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge
High-Memory Double Extra Large Instance
34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge
High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge
High-CPU Instances
Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.
High-CPU Medium Instance
1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium
High-CPU Extra Large Instance
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
Cluster Compute Instances
Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.
Cluster Compute Quadruple Extra Large Instance
23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge
Cluster GPU Instances
Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn moreabout use of this instance type for HPC applications.
Cluster GPU Quadruple Extra Large Instance
22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge
versus-
Windows Azure compute instances come in five unique sizes to enable complex applications and workloads.
Compute Instance Size
CPU
Memory
Instance Storage
I/O Performance
Extra Small
1 GHz
768 MB
20 GB*
Low
Small
1.6 GHz
1.75 GB
225 GB
Moderate
Medium
2 x 1.6 GHz
3.5 GB
490 GB
High
Large
4 x 1.6 GHz
7 GB
1,000 GB
High
Extra large
8 x 1.6 GHz
14 GB
2,040 GB
High
*There is a limitation on the Virtual Hard Drive (VHD) size if you are deploying a Virtual Machine role on an extra small instance. The VHD can only be up to 15 GB.
A workshop on using R for Predictive Modeling, by the Director, Non Clinical Stats, Pfizer. Interesting Bay Area Event- part of next edition of Predictive Analytics World
Sunday, March 13, 2011 in San Francisco
R for Predictive Modeling:
A Hands-On Introduction
Intended Audience: Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants “to turn ideas into software, quickly and faithfully.”
Knowledge Level: Either hands-on experience with predictive modeling (without R) or hands-on familiarity with any programming language (other than R) is sufficient background and preparation to participate in this workshop.
Workshop Description
This one-day session provides a hands-on introduction to R, the well-known open-source platform for data analysis. Real examples are employed in order to methodically expose attendees to best practices driving R and its rich set of predictive modeling packages, providing hands-on experience and know-how. R is compared to other data analysis platforms, and common pitfalls in using R are addressed.
The instructor, a leading R developer and the creator of CARET, a core R package that streamlines the process for creating predictive models, will guide attendees on hands-on execution with R, covering:
A working knowledge of the R system
The strengths and limitations of the R language
Preparing data with R, including splitting, resampling and variable creation
Developing predictive models with R, including decision trees, support vector machines and ensemble methods
Visualization: Exploratory Data Analysis (EDA), and tools that persuade
Evaluating predictive models, including viewing lift curves, variable importance and avoiding overfitting
Hardware: Bring Your Own Laptop
Each workshop participant is required to bring their own laptop running Windows or OS X. The software used during this training program, R, is free and readily available for download.
Attendees receive an electronic copy of the course materials and related R code at the conclusion of the workshop.
Max Kuhn, Director, Nonclinical Statistics, Pfizer
Max Kuhn is a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He has been apply models in the pharmaceutical industries for over 15 years.
He is a leading R developer and the author of several R packages including the CARET package that provides a simple and consistent interface to over 100 predictive models available in R.
Mr. Kuhn has taught courses on modeling within Pfizer and externally, including a class for the India Ministry of Information Technology.
Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.
Ajay- Describe your career in science. How do you think can more young people be made interested in science.
Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).
I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.
That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.
Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book 😉
Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book
Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.
The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.
In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.
Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.
Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.
In other words: do not expect to become rich with the solution I describe in the chapter !
Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R
Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.
I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!
Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc
Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!
Ajay- What do you do to relax or unwind when not working?
Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.
Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.
For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-
For more information and to place an order, visit us at http://www.crcpress.com. Order online and apply 20% Off discount code 907HM at checkout. CRC is pleased to offer free standard shipping on all online orders!
Price: $79.95 Cat. #: K10510 ISBN: 9781439810187 ISBN 10: 1439810184 Publication Date: November 09, 2010 Number of Pages: 305 Availability: In Stock Binding(s): Hardback
A message from Predictive Analytics World on newly available videos. It has many free videos as well so you can check them out.
Access PAW DC Session Videos Now
Predictive Analytics World is pleased to announce on-demand access to the videos of PAW Washington DC, October 2010, including over 30 sessions and keynotes that you may view at your convenience. Access this leading predictive analytics content online now:
Select individual conference sessions, or recognize savings by registering for access to one or two full days of sessions. These on-demand videos deliver PAW DC right to your desk, covering hot topics and advanced methods such as:
PAW DC videos feature over 25 speakers with case studies from leading enterprises such as: CIBC, CEB, Forrester, Macy’s, MetLife, Microsoft, Miles Kimball, Monster.com, Oracle, Paychex, SunTrust, Target, UPMC, Xerox, Yahoo!, YMCA, and more.
Keynote: Five Ways Predictive Analytics Cuts Enterprise Risk
Eric Siegel,Ph.D., Program Chair, Predictive Analytics World
All business is an exercise in risk management. All organizations would benefit from measuring, tracking and computing risk as a core process, much like insurance companies do.
Predictive analytics does the trick, one customer at a time. This technology is a data-driven means to compute the risk each customer will defect, not respond to an expensive mailer, consume a retention discount even if she were not going to leave in the first place, not be targeted for a telephone solicitation that would have landed a sale, commit fraud, or become a “loss customer” such as a bad debtor or an insurance policy-holder with high claims.
In this keynote session, Dr. Eric Siegel reveals:
– Five ways predictive analytics evolves your enterprise to reduce risk
– Hidden sources of risk across operational functions
– What every business should learn from insurance companies
– How advancements have reversed the very meaning of fraud
– Why “man + machine” teams are greater than the sum of their parts for enterprise decision support
Platinum Sponsor Presentation: Analytics – The Beauty of Diversity
Anne H. Milley,Senior Director of Analytic Strategy, Worldwide Product Marketing, SAS
Analytics contributes to, and draws from, multiple disciplines. The unifying theme of “making the world a better place” is bred from diversity. For instance, the same methods used in econometrics might be used in market research, psychometrics and other disciplines. In a similar way, diverse paradigms are needed to best solve problems, reveal opportunities and make better decisions. This is why we evolve capabilities to formulate and solve a wide range of problems through multiple integrated languages and interfaces. Extending that, we have provided integration with other languages so that users can draw on the disciplines and paradigms needed to best practice their craft.
Gold Sponsor Presentation: Predictive Analytics Accelerate Insight for Financial Services
Finbarr Deely,Director of Business Development,ParAccel
Financial services organizations face immense hurdles in maintaining profitability and building competitive advantage. Financial services organizations must perform “what-if” scenario analysis, identify risks, and detect fraud patterns. The advanced analytic complexity required often makes such analysis slow and painful, if not impossible. This presentation outlines the analytic challenges facing these organizations and provides a clear path to providing the accelerated insight needed to perform in today’s complex business environment to reduce risk, stop fraud and increase profits. * The value of predictive analytics in Accelerating Insight * Financial Services Analytic Case Studies * Brief Overview of ParAccel Analytic Database
TOPIC: SURVEY ANALYSIS Case Study: YMCA Turning Member Satisfaction Surveys into an Actionable Narrative
Dean Abbott,President, Abbott Analytics
Employees are a key constituency at the Y and previous analysis has shown that their attitudes have a direct bearing on Member Satisfaction. This session will describe a successful approach for the analysis of YMCA employee surveys. Decision trees are built and examined in depth to identify key questions in describing key employee satisfaction metrics, including several interesting groupings of employee attitudes. Our approach will be contrasted with other factor analysis and regression-based approaches to survey analysis that we used initially. The predictive models described are currently in use and resulted in both greater understanding of employee attitudes, and a revised “short-form” survey with fewer key questions identified by the decision trees as the most important predictors.
TOPIC: INDUSTRY TRENDS 2010 Data Minter Survey Results: Highlights
Karl Rexer,Ph.D., Rexer Analytics
Do you want to know the views, actions, and opinions of the data mining community? Each year, Rexer Analytics conducts a global survey of data miners to find out. This year at PAW we unveil the results of our 4th Annual Data Miner Survey. This session will present the research highlights, such as:
Multiple Case Studies: U.S. DoD, U.S. DHS, SSA Text Mining: Lessons Learned
John F. Elder,Chief Scientist, Elder Research, Inc.
Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.
In solving unstructured (text) analysis challenges, we found that principles from inductive modeling – learning relationships from labeled cases – has great power to enhance text mining. Dr. Elder highlights key technical breakthroughs discovered while working on projects for leading government agencies, including: Text Mining is the “Wild West” of data mining and predictive analytics – the potential for gain is huge, the capability claims are often tall tales, and the “land rush” for leadership is very much a race.
– Prioritizing searches for the Dept. of Homeland Security
– Quick decisions for Social Security Admin. disability
– Document discovery for the Dept. of Defense
– Disease discovery for the Dept. of Homeland Security
Keynote: How Target Gets the Most out of Its Guest Data to Improve Marketing ROI
Andrew Pole,Senior Manager, Media and Database Marketing, Target
In this session, you’ll learn how Target leverages its own internal guest data to optimize its direct marketing – with the ultimate goal of enhancing our guests’ shopping experience and driving in-store and online performance. You will hear about what guest data is available at Target, how and where we collect it, and how it is used to improve the performance and relevance of direct marketing vehicles. Furthermore, we will discuss Target’s development and usage of guest segmentation, response modeling, and optimization as means to suppress poor performers from mailings, determine relevant product categories and services for online targeted content, and optimally assign receipt marketing offers to our guests when offer quantities are limited.
Platinum Sponsor Presentation: Driving Analytics Into Decision Making
Jason Verlen,Director, SPSS Product Strategy & Management, IBM Software Group
Organizations looking to dramatically improve their business outcomes are turning to decision management, a convergence of technology and business processes that is used to streamline and predict the outcome of daily decision-making. IBM SPSS Decision Management technology provides the critical link between analytical insight and recommended actions. In this session you’ll learn how Decision Management software integrates analytics with business rules and business applications for front-line systems such as call center applications, insurance claim processing, and websites. See how you can improve every customer interaction, minimize operational risk, reduce fraud and optimize results.
TOPIC: DATA INFRASTRUCTURE AND INTEGRATION Case Study: Macy’s The world is not flat (even though modeling software has to think it is)
Paul Coleman,Director of Marketing Statistics, Macy’s Inc.
Software for statistical modeling generally use flat files, where each record represents a unique case with all its variables. In contrast most large databases are relational, where data are distributed among various normalized tables for efficient storage. Variable creation and model scoring engines are necessary to bridge data mining and storage needs. Development datasets taken from a sampled history require snapshot management. Scoring datasets are taken from the present timeframe and the entire available universe. Organizations, with significant data, must decide when to store or calculate necessary data and understand the consequences for their modeling program.
TOPIC: CUSTOMER VALUE Case Study: SunTrust When One Model Will Not Solve the Problem – Using Multiple Models to Create One Solution
Dudley Gwaltney,Group Vice President, Analytical Modeling, SunTrust Bank
In 2007, SunTrust Bank developed a series of models to identify clients likely to have large changes in deposit balances. The models include three basic binary and two linear regression models.
Based on the models, 15% of SunTrust clients were targeted as those most likely to have large balance changes. These clients accounted for 65% of the absolute balance change and 60% of the large balance change clients. The targeted clients are grouped into a portfolio and assigned to individual SunTrust Retail Branch. Since 2008, the portfolio generated a 2.6% increase in balances over control.
Using the SunTrust example, this presentation will focus on:
TOPIC: RESPONSE & CROSS-SELL Case Study: Paychex Staying One Step Ahead of the Competition – Development of a Predictive 401(k) Marketing and Sales Campaign
Jason Fox,Information Systems and Portfolio Manager,Paychex
In-depth case study of Paychex, Inc. utilizing predictive modeling to turn the tides on competitive pressures within their own client base. Paychex, a leading provider of payroll and human resource solutions, will guide you through the development of a Predictive 401(k) Marketing and Sales model. Through the use of sophisticated data mining techniques and regression analysis the model derives the probability a client will add retirement services products with Paychex or with a competitor. Session will include roadblocks that could have ended development and ROI analysis. Speaker: Frank Fiorille, Director of Enterprise Risk Management, Paychex Speaker: Jason Fox, Risk Management Analyst, Paychex
TOPIC: SEGMENTATION Practitioner: Canadian Imperial Bank of Commerce Segmentation Do’s and Don’ts
Daymond Ling,Senior Director, Modelling & Analytics,Canadian Imperial Bank of Commerce
The concept of Segmentation is well accepted in business and has withstood the test of time. Even with the advent of new artificial intelligence and machine learning methods, this old war horse still has its place and is alive and well. Like all analytical methods, when used correctly it can lead to enhanced market positioning and competitive advantage, while improper application can have severe negative consequences.
This session will explore what are the elements of success, and what are the worse practices that lead to failure. The relationship between segmentation and predictive modeling will also be discussed to clarify when it is appropriate to use one versus the other, and how to use them together synergistically.
TOPIC: SOCIAL DATA
Thought Leadership Social Network Analysis: Killer Application for Cloud Analytics
James Kobielus,Senior Analyst, Forrester Research
Social networks such as Twitter and Facebook are a potential goldmine of insights on what is truly going through customers´minds. Every company wants to know whether, how, how often, and by whom they´re being mentioned across the billowing new cloud of social media. Just as important, every company wants to influence those discussions in their favor, target new business, and harvest maximum revenue potential. In this session, Forrester analyst James Kobielus identifies fruitful applications of social network analysis in customer service, sales, marketing, and brand management. He presents a roadmap for enterprises to leverage their inline analytics initiatives and leverage high-performance data warehousing (DW) clouds and appliances in order to analyze shifting patterns of customer sentiment, influence, and propensity. Leveraging Forrester’s ongoing research in advanced analytics and customer relationship management, Kobielus will discuss industry trends, commercial modeling tools, and emerging best practices in social network analysis, which represents a game-changing new discipline in predictive analytics.
Trish Mathe,
Director of Database Marketing, Life Line Screening
While Life Line is successfully executing a US CRM roadmap, they are also beginning this same evolution abroad. They are beginning in the UK where Merkle procured data and built a response model that is pulling responses over 30% higher than competitors. This presentation will give an overview of the US CRM roadmap, and then focus on the beginning of their strategy abroad, focusing on the data procurement they could not get anywhere else but through Merkle and the successful modeling and analytics for the UK. Speaker: Ozgur Dogan, VP, Quantitative Solutions Group, Merkle Inc Speaker: Trish Mathe, Director of Database Marketing, Life Line Screening
TOPIC: SURVEY ANALYSIS Case Study: Forrester Making Survey Insights Addressable and Scalable – The Case Study of Forrester’s Technographics Benchmark Survey
Marketers use surveys to create enterprise wide applicable strategic insights to: (1) develop segmentation schemes, (2) summarize consumer behaviors and attitudes for the whole US population, and (3) use multiple surveys to draw unified views about their target audience. However, these insights are not directly addressable and scalable to the whole consumer universe which is very important when applying the power of survey intelligence to the one to one consumer marketing problems marketers routinely face. Acxiom partnered with Forrester Research, creating addressable and scalable applications of Forrester’s Technographics Survey and applied it successfully to a number of industries and applications.
TOPIC: HEALTHCARE Case Study: UPMC Health Plan A Predictive Model for Hospital Readmissions
Scott Zasadil,Senior Scientist, UPMC Health Plan
Hospital readmissions are a significant component of our nation’s healthcare costs. Predicting who is likely to be readmitted is a challenging problem. Using a set of 123,951 hospital discharges spanning nearly three years, we developed a model that predicts an individual’s 30-day readmission should they incur a hospital admission. The model uses an ensemble of boosted decision trees and prior medical claims and captures 64% of all 30-day readmits with a true positive rate of over 27%. Moreover, many of the ‘false’ positives are simply delayed true positives. 53% of the predicted 30-day readmissions are readmitted within 180 days.
I recently found an interesting example of a website that both makes a lot of money and yet is much more efficient than any free or non profit. It is called ECOSIA
If you see a website that wants to balance administrative costs plus have a transparent way to make the world better- this is a great example.
World’s largest tropical forest reserve (38,867 square kilometers, or about the size of Switzerland)
Home to about 14% of all amphibian species and roughly 54% of all bird species in the Amazon – not to mention large populations of at least eight threatened species, including the jaguar
Includes part of the Guiana Shield containing 25% of world’s remaining tropical rainforests – 80 to 90% of which are still pristine
Holds the last major unpolluted water reserves in the Neotropics, containing approximately 20% of all of the Earth’s water
One of the last tropical regions on Earth vastly unaltered by humans
Significant contributor to climatic regulation via heat absorption and carbon storage
Click per milli (or CPM) gives you a very low low conversion compared to contacting ad sponsor directly.
But its a great data experiment-
as you can monitor which companies are likely to be advertised on your site (assume google knows more about their algols than you will)
which formats -banner or text or flash have what kind of conversion rates
what are the expected pay off rates from various keywords or companies (like business intelligence software, predictive analytics software and statistical computing software are similar but have different expected returns (if you remember your eco class)
NOW- Based on above data, you know whats your minimum baseline to expect from a private advertiser than a public, crowd sourced search engine one (like Google or Bing)
Lets say if you have 100000 views monthly. and assume one out of 1000 page views will lead to a click. Say the advertiser will pay you 1 $ for every 1 click (=1000 impressions)
Then your expected revenue is $100.But if your clicks are priced at 2.5$ for every click , and your click through rate is now 3 out of 1000 impressions- (both very moderate increases that can done by basic placement optimization of ad type, graphics etc)-your new revenue is 750$.
Be a good Samaritan- you decide to share some of this with your audience -like 4 Amazon books per month ( or I free Amazon book per week)- That gives you a cost of 200$, and leaves you with some 550$.
Wait! it doesnt end there- Adam Smith‘s invisible hand moves on .
You say hmm let me put 100 $ for an annual paper writing contest of $1000, donate $200 to one laptop per child ( or to Amazon rain forests or to Haiti etc etc etc), pay $100 to your upgraded server hosting, and put 350$ in online advertising. say $200 for search engines and $150 for Facebook.
Woah!
Month 1 would should see more people visiting you for the first time. If you have a good return rate (returning visitors as a %, and low bounce rate (visits less than 5 secs)- your traffic should see atleast a 20% jump in new arrivals and 5-10 % in long term arrivals. Ignoring bounces- within three months you will have one of the following
1) An interesting case study on statistics on online and social media advertising, tangible motivations for increasing community response , and some good data for study
2) hopefully better cost management of your server expenses
3)very hopefully a positive cash flow
you could even set a percentage and share the monthly (or annually is better actions) to your readers and advertisers.
go ahead- change the world!
the key paradigms here are sharing your traffic and revenue openly to everyone
donating to a suitable cause
helping increase awareness of the suitable cause
basing fixed percentages rather than absolute numbers to ensure your site and cause are sustained for years.