2011 Analytics Recap

Events in the field of data that impacted us in 2011

1) Oracle unveiled plans for R Enterprise. This is one of the strongest statements of its focus on in-database analytics. Oracle also unveiled plans for a Public Cloud

2) SAS Institute released version 9.3 , a major analytics software in industry use.

3) IBM acquired many companies in analytics and high tech. Again.However the expected benefits from Cognos-SPSS integration are yet to show a spectacular change in market share.

2011 Selected acquisitions

Emptoris Inc. December 2011

Cúram Software Ltd. December 2011

DemandTec December 2011

Platform Computing October 2011

 Q1 Labs October 2011

Algorithmics September 2011

 i2 August 2011

Tririga March 2011

 

4) SAP promised a lot with SAP HANA- again no major oohs and ahs in terms of market share fluctuations within analytics.

http://www.sap.com/india/news-reader/index.epx?articleID=17619

5) Amazon continued to lower prices of cloud computing and offer more options.

http://aws.amazon.com/about-aws/whats-new/2011/12/21/amazon-elastic-mapreduce-announces-support-for-cc2-8xlarge-instances/

6) Google continues to dilly -dally with its analytics and cloud based APIs. I do not expect all the APIs in the Google APIs suit to survive and be viable in the enterprise software space.  This includes Google Cloud Storage, Cloud SQL, Prediction API at https://code.google.com/apis/console/b/0/ Some of the location based , translation based APIs may have interesting spin offs that may be very very commercially lucrative.

7) Microsoft -did- hmm- I forgot. Except for its investment in Revolution Analytics round 1 many seasons ago- very little excitement has come from MS plans in data mining- The plugins for cloud based data mining from Excel remain promising yet , while Azure remains a stealth mode starter.

8) Revolution Analytics promised us a GUI and didnt deliver (till yet 🙂 ) . But it did reveal a much better Enterprise software Revolution R 5.0 is one of the strongest enterprise software in the R /Stat Computing space and R’s memory handling problem is now an issue of perception than actual stuff thanks to newer advances in how it is used.

9) More conferences, more books and more news on analytics startups in 2011. Big Data analytics remained a strong buzzword. Expect more from this space including creative uses of Hadoop based infrastructure.

10) Data privacy issues continue to hamper and impede effective analytics usage. So does rational and balanced regulation in some of the most advanced economies. We expect more regulation and better guidelines in 2012.

A Brief Overview of Open vs Closed in Computing

1984 – IBM   (Big Brother) vs Apple  (Computing opened for individuals)

1988- Apple (Closed Hardware and Software) vs Microsoft (  Licensed to all software)

1998- Microsoft (Source code is closed but licenses to all) vs Linux (Open Source Code)

2008- Apple (Closed Hardware and Software) vs Google (Android/Linux) -(Free and Open Source)

2010 – Google (Web open to search) vs Facebook (Closed to search)

2018 (?)-Google (Code is open for all non revenue generating software, but search engine algorithm is closed) VS       TBD

Amazon CC2 – The Big Cloud is finally here

Finally a powerful enough cloud computing instance from Amazon EC2 – called CC2 priced at 3$ per hour (for Windows instances) and 2.4$/hour for Linux

It would be interesting to see how SAS, IBM SPSS or R can leverage these

Storage – On the storage front, the CC2 instance type is packed with 60.5 GB of RAM and 3.37 TB of instance storage.

Processing – The CC2 instance type includes 2 Intel Xeon processors, each with 8 hardware cores. We’ve enabled Hyper-Threading, allowing each core to process a pair of instruction streams in parallel. Net-net, there are 32 hardware execution threads and you can expect 88 EC2 Compute Units (ECU’s) from this 64-bit instance type

On a somewhat smaller scale, you can launch your own array of 290 CC2 instances and create a Top500 supercomputer (63.7 teraFLOPS) at a cost of less than $1000 per hour

http://aws.typepad.com/aws/2011/11/next-generation-cluster-computing-on-amazon-ec2-the-cc2-instance-type.html

 

 

and

http://aws.amazon.com/hpc-applications/

 

 

Cluster Compute Eight Extra Large specifications:
88 EC2 Compute Units (Eight-core 2 x Intel Xeon)
60.5 GB of memory
3370 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc2.8xlarge
Price: Starting from $2.40 per hour

But some caveats

  • The instances are available in a single Availability Zone in the US East (Northern Virginia) Region. We plan to add capacity in other EC2 Regions throughout 2012.
  • You can run 2 CC2 instances by default.
  • You cannot currently launch instances of this type within a Virtual Private Cloud (VPC).

Amazing IBM Tech Trends 2011 report

I was reading the amazing Tech Trend 2011 report by IBM at https://www.ibm.com/developerworks/mydeveloperworks/files/app/person/060001TJG2/file/110ccd08-25d9-4932-9bcc-c583868c9f31

What really amazed me is that distortions introduced in Data Visualization even in length of the graphs.

See below and click to enlarge- my notes are in black font, they refer to the length of the weird green bar(?). This I think is one of the worst graphs I have seen this year.

 

 

Text Analytics World in New York

There is a 15 % discount if you want to register for Text Analytics World next month-

Use Discount Code AJAYNY11

October 19-20, 2011 at The Hilton New York

http://www.textanalyticsworld.com/newyork/2011

Text Analytics World Topics & Case Studies - Oct 19-20 in NYC

Text Analytics World NYC (tawgo.com) is the business-focused event for text analytics professionals,
managers and commercial practitioners. This conference delivers case studies, expertise and resources
to leverage unstructured data for business impact.
Text Analytics World NYC is packed with the top predictive analytics experts, practitioners, authors and
business thought leaders, including keynote addresses from Thomas Davenport, author of Competing
on Analytics: The New Science of Winning, David Gondek from IBM Research on their Jeopardy-Winning
Watson and DeepQA, and PAW Program Chair Eric Siegel, plus special sessions from industry heavy-
weights Usama Fayyad and John Elder.
CASE STUDIES:

TAW New York City will feature over 25 sessions with case studies from leading enterprises in
automotive, educational, e-commerce, financial services, government, high technology, insurance,
retail, social media, and telecom such as: Accident Fund, Amdocs, Bundle.com, Citibank, Florida State
College, Google, Intuit, MetLife, Mitchell1, PayPal, Snap-on, Socialmediatoday, Topsy, a Fortune 500
global technology company, plus special examples from U.S. government agencies DoD, DHS, and SSA.

HOT TOPICS:

TAW New York City's agenda covers hot topics and advanced methods such as churn risk detection,
customer service and call centers, decision support, document discovery, document filtering, financial
indicators from social media, fraud detection, government applications, insurance applications,
knowledge discovery, open question-answering, parallelized text analysis, risk profiling, sentiment
analysis, social media applications, survey analysis, topic discovery, and voice of the customer and other
innovative applications that benefit organizations in new and creative ways.

WORKSHOPS: TAW also features a full-day, hands-on text analytics workshop, plus several other pre-
and post-conference workshops in analytics that complement the core conference program. For more
info: www.tawgo.com/newyork/2011/analytics-workshops
For more information: tawgo.com
Download the conference preview:
Conference Preview for TAW New York, October 19-20 2011
View the agenda at-a-glance: textanalyticsworld.com/newyork/2011/agenda Register by September 2nd for Early Bird Rates (save up to $200): textanalyticsworld.com/newyork/2011/registration If you'd like our informative event updates, sign up at: http://www.textanalyticsworld.com/subscription.php To sign up for TAW group on LinkedIn: www.linkedin.com/e/gis/3869759 For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495. OTHER ANALYTICS EVENTS: Predictive Analytics World for Government: Sept 12-13 in DC – www.pawgov.com Predictive Analytics World New York City: Oct 16-21 – www.pawcon.com/nyc Text Analytics World New York City: Oct 19-20 – www.tawgo.com/nyc Predictive Analytics World London: Nov 30-Dec 1 – www.pawcon.com/london Predictive Analytics World San Francisco: March 4-10, 2012 – www.pawcon.com/sanfrancisco Predictive Analytics World Videos: Available on-demand – www.pawcon.com/video
Also has two sessions on R

Sunday, October 16, 2011


Half-day Workshop
Room: Madison

R Bootcamp
Click here for the detailed workshop description

  • Workshop starts at 1:00pm
  • Afternoon Coffee Break at 2:30pm – 3:00pm
  • End of the Workshop: 5:00pm

Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer

Top of this page ] [ Agenda overview ]

Monday, October 17, 2011


Full-day Workshop
Room: Madison

R for Predictive Modeling: A Hands-On Introduction
Click here for the detailed workshop description

  • Workshop starts at 9:00am
  • Morning Coffee Break at 10:30am – 11:00am
  • Lunch provided at 12:30 – 1:15pm
  • Afternoon Coffee Break at 2:30pm – 3:00pm
  • End of the Workshop: 4:30pm

Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer

Early Bird Discount- Conferences

Message from PAW and TAW conferences

 

The PAW and TAW New York City Early Bird discounts end this Friday.

———————–

– NEXT WEEK: PAW for Government, Sept 12-13, in Washington DC. An amazing line-up of keynotes including Congressman Darrell Issa. Coverage of predictive analytics deployment by over a dozen government agencies. See www.pawgov.com

– Predictive Analytics World NYC – Oct 16-21 – Early Bird Pricing ends this Friday, Sept 9 – register now to save $400 over the full price. Three tracks, over 40 sessions, keynotes from Davenport and from IBM Research on their Jeopardy-Winning Watson – plus much more. Seewww.pawcon.com/nyc

– Text Analytics World NYC (Oct 16-21) also ends Early Bird Pricing this Friday, Sept 9 – register now to save $400 over the full price. Over 25 sessions with case studies from Accident Fund, Amdocs, Bundle.com, Citibank, Google, Intuit, MetLife, PayPal, and much more. See www.tawgo.com/nyc

– PAW London: Nov 30 – Dec 1. Case studies from BBC, GSK, HP, ING, Lloyds TSB, Paychex, US Bank, Yahoo!, and more. See www.pawcon.com/london

– PAW and TAW San Francisco: Mar 4-10 2012 – Save-the-date and call-for-speakers. Seewww.pawcon.com/submit.php and www.tawgo.com/call-for-speakers

* For informative event updates: www.pawcon.com/signup-us.php

Interview Eberhard Miethke and Dr. Mamdouh Refaat, Angoss Software

Here is an interview with Eberhard Miethke and Dr. Mamdouh Refaat, of Angoss Software. Angoss is a global leader in delivering business intelligence software and predictive analytics solutions that help businesses capitalize on their data by uncovering new opportunities to increase sales and profitability and to reduce risk.

Ajay-  Describe your personal journey in software. How can we guide young students to pursue more useful software development than just gaming applications.

 Mamdouh- I started using computers long time ago when they were programmed using punched cards! First in Fortran, then C, later C++, and then the rest. Computers and software were viewed as technical/engineering tools, and that’s why we can still see the heavy technical orientation of command languages such as Unix shells and even in the windows Command shell. However, with the introduction of database systems and Microsoft office apps, it was clear that business will be the primary user and field of application for software. My personal trip in software started with scientific applications, then business and database systems, and finally statistical software – which you can think of it as returning to the more scientific orientation. However, with the wide acceptance of businesses of the application of statistical methods in different fields such as marketing and risk management, it is a fast growing field that in need of a lot of innovation.

Ajay – Angoss makes multiple data mining and analytics products. could you please introduce us to your product portfolio and what specific data analytics need they serve.

a- Attached please find our main product flyers for KnowledgeSTUDIO and KnowledgeSEEKER. We have a 3rd product called “strategy builder” which is an add-on to the decision tree modules. This is also described in the flyer.

(see- Angoss Knowledge Studio Product Guide April2011  and http://www.scribd.com/doc/63176430/Angoss-Knowledge-Seeker-Product-Guide-April2011  )

Ajay-  The trend in analytics is for big data and cloud computing- with hadoop enabling processing of massive data sets on scalable infrastructure. What are your plans for cloud computing, tablet based as well as mobile based computing.

a- This is an area where the plan is still being figured out in all organizations. The current explosion of data collected from mobile phones, text messages, and social websites will need radically new applications that can utilize the data from these sources. Current applications are based on the relational database paradigm designed in the 70’s through the 90’s of the 20th century.

But data sources are generating data in volumes and formats that are challenging this paradigm and will need a set of new tools and possibly programming languages to fit these needs. The cloud computing, tablet based and mobile computing (which are the same thing in my opinion, just different sizes of the device) are also two technologies that have not been explored in analytics yet.

The approach taken so far by most companies, including Angoss, is to rely on new xml-based standards to represent data structures for the particular models. In this case, it is the PMML (predictive modelling mark-up language) standard, in order to allow the interoperability between analytics applications. Standardizing on the representation of models is viewed as the first step in order to allow the implementation of these models to emerging platforms, being that the cloud or mobile, or social networking websites.

The second challenge cited above is the rapidly increasing size of the data to be analyzed. Angoss has already identified this challenge early on and is currently offering in-database analytics drivers for several database engines: Netezza, Teradata and SQL Server.

These drivers allow our analytics products to translate their routines into efficient SQL-based scripts that run in the database engine to exploit its performance as well as the powerful hardware on which it runs. Thus, instead of copying the data to a staging format for analytics, these drivers allow the data to be analyzed “in-place” within the database without moving it.

Thus offering performance, security and integrity. The performance is improved because of the use of the well tuned database engines running on powerful hardware.

Extra security is achieved by not copying the data to other platforms, which could be less secure. And finally, the integrity of the results are vastly improved by making sure that the results are always obtained by analyzing the up-to-date data residing in the database rather than an older copy of the data which could be obsolete by the time the analysis is concluded.

Ajay- What are the principal competing products to your offerings, and what makes your products special or differentiated in value to them (for each customer segment).

a- There are two major players in today’s market that we usually encounter as competitors, they are: SAS and IBM.

SAS offers a data mining workbench in the form of SAS Enterprise Miner, which is closely tied to SAS data mining methodology known as SEMMA.

On the other hand, IBM has recently acquired SPSS, which offered its Clementine data mining software. IBM has now rebranded Clementine as IBM SPSS Modeller.

In comparison to these products, our KnowledgeSTUDIO and KnowledgeSEEKER offer three main advantages: ease of use; affordability; and ease of integration into existing BI environments.

Angoss products were designed to look-and-feel-like popular Microsoft office applications. This makes the learning curve indeed very steep. Typically, an intermediate level analyst needs only 2-3 days of training to become proficient in the use of the software with all its advanced features.

Another important feature of Angoss software products is their integration with SAS/base product, and SQL-based database engines. All predictive models generated by Angoss can be automatically translated to SAS and SQL scripts. This allows the generation of scoring code for these common platforms. While the software interface simplifies all the tasks to allow business users to take advantage of the value added by predictive models, the software includes advanced options to allow experienced statisticians to fine-tune their models by adjusting all model parameters as needed.

In addition, Angoss offers a unique product called StrategyBuilder, which allows the analyst to add key performance indicators (KPI’s) to predictive models. KPI’s such as profitability, market share, and loyalty are usually required to be calculated in conjunction with any sales and marketing campaign. Therefore, StrategyBuilder was designed to integrate such KPI’s with the results of a predictive model in order to render the appropriate treatment for each customer segment. These results are all integrated into a deployment strategy that can also be translated into an execution code in SQL or SAS.

The above competitive features offered by the software products of Angoss is behind its success in serving over 4000 users from over 500 clients worldwide.

Ajay -Describe a major case study where using Angoss software helped save a big amount of revenue/costs by innovative data mining.

a-Rogers Telecommunications Inc. is one of the largest Canadian telecommunications providers, serving over 8.5 million customers and a revenue of 11.1 Billion Canadian Dollars (2009). In 2008, Rogers engaged Angoss in order to help with the problem of ballooning accounts receivable for a period of 18 months.

The problem was approached by improving the efficiency of the call centre serving the collections process by a set of predictive models. The first set of models were designed to find accounts likely to default ahead of time in order to take preventative measures. A second set of models were designed to optimize the call centre resources to focus on delinquent accounts likely to pay back most of the outstanding balance. Accounts that were identified as not likely to pack quickly were good candidates for “Early-out” treatment, by forwarding them directly to collection agencies. Angoss hosted Rogers’ data and provided on a regular interval the lists of accounts for each treatment to be deployed by the call centre dialler. As a result of this Rogers estimated an improvement of 10% of the collected sums.

Biography-

Mamdouh has been active in consulting, research, and training in various areas of information technology and software development for the last 20 years. He has worked on numerous projects with major organizations in North America and Europe in the areas of data mining, business analytics, business analysis, and engineering analysis. He has held several consulting positions for solution providers including Predict AG in Basel, Switzerland, and as ANGOSS Corp. Mamdouh is the Director of Professional services for EMEA region of ANGOSS Software. Mamdouh received his PhD in engineering from the University of Toronto and his MBA from the University of Leeds, UK.

Mamdouh is the author of:

"Credit Risk Scorecards: Development and Implmentation using SAS"
 "Data Preparation for Data Mining Using SAS",
 (The Morgan Kaufmann Series in Data Management Systems) (Paperback)
 and co-author of
 "Data Mining: Know it all",Morgan Kaufmann



Eberhard Miethke  works as a senior sales executive for Angoss

 

About Angoss-

Angoss is a global leader in delivering business intelligence software and predictive analytics to businesses looking to improve performance across sales, marketing and risk. With a suite of desktop, client-server and in-database software products and Software-as-a-Service solutions, Angoss delivers powerful approaches to turn information into actionable business decisions and competitive advantage.

Angoss software products and solutions are user-friendly and agile, making predictive analytics accessible and easy to use.