Analytics 2011 Conference

From http://www.sas.com/events/analytics/us/

The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.

Conference Details

October 24-25, 2011
Grande Lakes Resort
Orlando, FL

Analytics 2011 topic areas include:

Scoring SAS and SPSS Models in the cloud

Outline of a cloud containing text 'The Cloud'
Image via Wikipedia

An announcement from Zementis and Predixion Software– about using cloud computing for scoring models using PMML. Note R has a PMML package as well which is used by Rattle, data mining GUI for exporting models.

Source- http://www.marketwatch.com/story/predixion-software-introduces-new-product-to-run-sas-and-spss-predictive-models-in-the-cloud-2010-10-19?reflink=MW_news_stmp

——————————————————————————————————–

ALISO VIEJO, Calif., Oct 19, 2010 (BUSINESS WIRE) — Predixion Software today introduced Predixion PMML Connexion(TM), an interface that provides Predixion Insight(TM), the company’s low-cost, self-service in the cloud predictive analytics solution, direct and seamless access to SAS, SPSS (IBM) and other predictive models for use by Predixion Insight customers. Predixion PMML Connexion enables companies to leverage their significant investments in legacy predictive analytics solutions at a fraction of the cost of conventional licensing and maintenance fees.

The announcement was made at the Predictive Analytics World conference in Washington, D.C. where Predixion also announced a strategic partnership with Zementis, Inc., a market leader in PMML-based solutions. Zementis is exhibiting in Booth #P2.

The Predictive Model Markup Language (PMML) standard allows for true interoperability, offering a mature standard for moving predictive models seamlessly between platforms. Predixion has fully integrated this PMML functionality into Predixion Insight, meaning Predixion Insight users can now effortlessly import PMML-based predictive models, enabling information workers to score the models in the cloud from anywhere and publish reports using Microsoft Excel(R) and SharePoint(R). In addition, models can also be written back into SAS, SPSS and other platforms for a truly collaborative, interoperable solution.

“Predixion’s investment in this PMML interface makes perfect business sense as the lion’s share of the models in existence today are created by the SAS and SPSS platforms, creating compelling opportunity to leverage existing investments in predictive and statistical models on a low-cost cloud predictive analytics platform that can be fed with enterprise, line of business and cloud-based data,” said Mike Ferguson, CEO of Intelligent Business Strategies, a leading analyst and consulting firm specializing in the areas of business intelligence and enterprise business integration. “In this economy, Predixion’s low-cost, self-service predictive analytics solutions might be welcome relief to IT organizations chartered with quickly adding additional applications while at the same time cutting costs and staffing.”

“We are pleased to be partnering with Zementis, truly a PMML market leader and innovator,” said Predixion CEO Simon Arkell. “To allow any SAS or SPSS customer to immediately score any of their predictive models in the cloud from within Predixion Insight, compare those models to those created by Predixion Insight, and share the results within Excel and Sharepoint is an exciting step forward for the industry. SAS and SPSS customers are fed up with the high prices they must pay for their business users just to access reports generated by highly skilled PhDs who are burdened by performing routine tasks and thus have become a massive bottleneck. That frustration is now a thing of the past because any information worker can now unlock the power of predictive analytics without relying on experts — for a fraction of the cost and from anywhere they can connect to the cloud,” Arkell said.

Dr. Michael Zeller, Zementis CEO, added, “Our mission is to significantly shorten the time-to-market for predictive models in any industry. We are excited to be contributing to Predixion’s self-service, cloud-based predictive analytics solution set.”

About Predixion Software

Predixion Software develops and markets collaborative predictive analytics solutions in the public and private cloud. Predixion enables self-service predictive analytics, allowing customers to use and analyze large amounts of data to make actionable decisions, all within the familiar environment of Excel and PowerPivot. Predixion customers are achieving immediate results across a multitude of industries including: retail, finance, healthcare, marketing, telecommunications and insurance/risk management.

Predixion Software is headquartered in Aliso Viejo, California with development offices in Redmond, Washington. The company has venture capital backing from established investors including DFJ Frontier, Miramar Venture Partners and Palomar Ventures. For more information please contact us at 949-330-6540, or visit us atwww.predixionsoftware.com.

About Zementis

Zementis, Inc. is a leading software company focused on the operational deployment and integration of predictive analytics and data mining solutions. Its ADAPA(R) decision engine successfully bridges the gap between science and engineering. ADAPA(R) was designed from the ground up to benefit from open standards and to significantly shorten the time-to-market for predictive models in any industry. For more information, please visit www.zementis.com.

 

KXEN Update

Update from a very good data mining software company, KXEN –

  1. Longtime Chairman and founder Roger Haddad is retiring but would be a Board Member. See his interview with Decisionstats here https://decisionstats.wordpress.com/2009/01/05/interview-roger-haddad-founder-of-kxen-automated-modeling-software/ (note images were hidden due to migration from .com to .wordpress.com )
  2. New Members of Leadership are as-
John Ball, CEOJohn Ball
Chief Executive Officer

John Ball brings 20 years of experience in enterprise software, deep expertise in business intelligence and CRM applications, and a proven track record of success driving rapid growth at highly innovative companies.

Prior to joining KXEN, Mr. Ball served in several executive roles at salesforce.com, the leading provider of SaaS applications. Most recently, John served as VP & General Manager, Analytics and Reporting Products, where he spearheaded salesforce.com’s foray into CRM analytics and business intelligence. John also served as VP & General Manager, Service and Support Applications at salesforce.com, where he successfully grew the business to become the second largest and fastest growing product line at salesforce.com. Before salesforce.com, Ball was founder and CEO of Netonomy, the leading provider of customer self-service solutions for the telecommunications industry. Ball also held a number of executive roles at Business Objects, including General Manager, Web Products, where delivered to market the first 3 versions of WebIntelligence. Ball has a master’s degree in electrical engineering from Georgia Tech and a master’s degree in electric

I hope John atleast helps build a KXEN Force.com application- there are only 2 data mining apps there on App Exchange. Also on the wish list  more social media presence, a Web SaaS/Amazon API for KXEN, greater presence in American/Asian conferences, and a solution for SME’s (which cannot afford the premium pricing of the flagship solution. An alliance with bigger BI vendors like Oracle, SAP or IBM  for selling the great social network analysis.

Bill Russell as Non Executive Chairman-

Bill Russell as Non-executive Chairman of the Board, effective July 16 2010. Russell has 30 years of operational experience in enterprise software, with a special focus on business intelligence, analytics, and databases.Russell held a number of senior-level positions in his more than 20 years at Hewlett-Packard, including Vice President and General Manager of the multi-billion dollar Enterprise Systems Group. He has served as Non-executive Chairman of the Board for Sylantro Systems Corporation, webMethods Inc., and Network Physics, Inc. and has served as a board director for Cognos Inc. In addition to KXEN, Russell currently serves on the boards of Saba, PROS Holdings Inc., Global 360, ParAccel Inc., and B.T. Mancini Company.

Xavier Haffreingue as senior vice president, worldwide professional services and solutions.
He has almost 20 years of international enterprise software experience gained in the CRM, BI, Web and database sectors. Haffreingue joins KXEN from software provider Axway where he was VP global support operations. Prior to Axway, he held various leadership roles in the software industry, including VP self service solutions at Comverse Technologies and VP professional services and support at Netonomy, where he successfully delivered multi-million dollar projects across Europe, Asia-Pacific and Africa. Before that he was with Business Objects and Sybase, where he ran support and services in southern Europe managing over 2,500 customers in more than 20 countries.

David Guercio  as senior vice president, Americas field operations. Guercio brings to the role more than 25 years experience of building and managing high-achieving sales teams in the data mining, business intelligence and CRM markets. Guercio comes to KXEN from product lifecycle management vendor Centric Software, where he was EVP sales and client services. Prior to Centric, he was SVP worldwide sales and client services at Inxight Software, where he was also Chairman and CEO of the company’s Federal Systems Group, a subsidiary of Inxight that saw success in the US Federal Government intelligence market. The success in sales growth and penetration into the federal government led to the acquisition of Inxight by Business Objects in 2007, where Guercio then led the Inxight sales organization until Business Objects was acquired by SAP. Guercio was also a key member of the management team and a co-founder at Neovista, an early pioneer in data mining and predictive analytics. Additionally, he held the positions of director of sales and VP of professional services at Metaphor Computer Systems, one of the first data extraction solutions companies, which was acquired by IBM. During his career, Guercio also held executive positions at Resonate and SiGen.

3) Venture Capital funding to fund expansion-

It has closed $8 million in series D funding to further accelerate its growth and international expansion. The round was led by NextStage and included participation from existing investors XAnge Capital, Sofinnova Ventures, Saints Capital and Motorola Ventures.

This was done after John Ball had joined as CEO.

4) Continued kudos from analysts and customers for it’s technical excellence.

KXEN was named a leader in predictive analytics and data mining by Forrester Research (1) and was rated highest for commercial deployments of social network analytics by Frost & Sullivan (2)

Also it became an alliance partner of Accenture- which is also a prominent SAS partner as well.

In Database Optimization-

In KXEN V5.1, a new data manipulation module (ADM) is provided in conjunction with scoring to optimize database workloads and provide full in-database model deployment. Some leading data mining vendors are only now beginning to offer this kind of functionality, and then with only one or two selected databases, giving KXEN a more than five-year head start. Some other vendors are only offering generic SQL generation, not optimized for each database, and do not provide the wealth of possible outputs for their scoring equations: For example, real operational applications require not only to generate scores, but decision probabilities, error bars, individual input contributions – used to derive reasons of decision and more, which are available in KXEN in-database scoring modules.

Since 2005, KXEN has leveraged databases as the data manipulation engine for analytical dataset generation. In 2008, the ADM (Analytical Data Management) module delivered a major enhancement by providing a very easy to use data manipulation environment with unmatched productivity and efficiency. ADM works as a generator of optimized database-specific SQL code and comes with an integrated layer for the management of meta-data for analytics.

KXEN Modeling Factory- (similar to SAS’s recent product Rapid Predictive Modeler http://www.sas.com/resources/product-brief/rapid-predictive-modeler-brief.pdf and http://jtonedm.com/2010/09/02/first-look-rapid-predictive-modeler/)

KXEN Modeling Factory (KMF) has been designed to automate the development and maintenance of predictive analytics-intensive systems, especially systems that include large numbers of models, vast amounts of data or require frequent model refreshes. Information about each project and model is monitored and disseminated to ensure complete management and oversight and to facilitate continual improvement in business performance.

Main Functions

Schedule: creation of the Analytic Data Set (ADS), setup of how and when to score, setup of when and how to perform model retraining and refreshes …

Report
: Monitormodel execution over time, Track changes in model quality over time, see how useful one variable is by considering its multiple instance in models …

Notification
: Rather than having to wade through pages of event logs, KMF Department allows users to manage by exception through notifications.

Other products from KXEN have been covered here before https://decisionstats.wordpress.com/tag/kxen/ , including Structural Risk Minimization- https://decisionstats.wordpress.com/2009/04/27/kxen-automated-regression-modeling/

Thats all for the KXEN update- all the best to the new management team and a splendid job done by Roger Haddad in creating what is France and Europe’s best known data mining company.

Note- Source – http://www.kxen.com


SAS Scoring Accelerators

One of the most interesting SAS product launches of 2009. I am currently reading SAS Enterprise Miner and I am quite impressed – in fact we have a 1500 processor HPC cluster , besides access to Kraken, the 3 largest HPC in the world. It is interesting to see possible application uses for that. Of course I am currently fiddling with R based parallelized clustering on them.

SAS® Scoring Accelerator

Citation-

http://www.sas.com/technologies/analytics/datamining/scoring_acceleration/index.html

Quickly and accurately process and score analytic models built in SAS® Enterprise MinerTM

What is SAS® Scoring Accelerator?
SAS Scoring Accelerator translates and registers SAS Enterprise Miner models into database-specific functions to be deployed and then executed for scoring purposes directly within the database. SAS Scoring Accelerator is a separate product that works in conjunction with  SAS Enterprise Miner.

Why is SAS® Scoring Accelerator important?
SAS Scoring Accelerator automates the movement of the model scoring processes inside the database. Faster deployment of analytic models means more timely results, enabling business users to make important business decisions. Better-performing models help ensure the accuracy of the analytic results you’re using to make critical business decisions.

For whom is SAS® Scoring Accelerator?
SAS Scoring Accelerator is specifically for organizations that use SAS Enterprise Miner. It is designed for chief scoring officers and IT to score analytic models directly inside the database.

Key Benefits:

 

  • Achieve higher model-scoring performance and faster time to results.
  • Reduce data movement and latency.
  • Improve accuracy and effectiveness of analytic models.
  • Reduce labor costs and errors by eliminating model score code rewrite and model revalidation efforts.
  • Better manage, provision and govern data.

 

Key Features:

Export Utility:

  • Functions as a plug-in to SAS Enterprise Miner that exports the model scoring logic including metadata about the required input and output variables.

Publishing Client:

  • Automatically translates and publishes the model into C source code for creating the scoring function inside the database.
  • Generates a script of database commands for registering the scoring user-defined function (UDF) inside the database. Scoring UDFs are available to use in any SQL expression wherever database-specific built-in functions are typically used.
  • Supports a robust class of SAS Enterprise Miner predictive and descriptive models including the preliminary transformation layer.

SAS Scoring Accelerator interfaces with the following relational databases:

  • SAS® Scoring Accelerator for Teradata
  • SAS® Scoring Accelerator for Netezza

KNIME and Zementis shake hands

Two very good and very customer centric (and open source ) companies shook hands on a strategic partnership today.

Knime  www.knime.org and Zementis www.zementis.com .

Decision Stats has been covering these companies and both the products are amazing good, synch in very well thanks to the support of the PMML standard and lower costs considerably for the consumer. (http://www.decisionstats.com/2009/02/knime/ ) and http://www.decisionstats.com/2009/02/interview-michael-zeller-ceozementis/ )

While Knime has both a free personal as well as a commercial license , it supports R thanks to the PMML (www.dmg.org initiative ). Knime also supports R very well .

See http://www.knime.org/blog/export-and-convert-r-models-pmml-within-knime

The following example R script learns a decision tree based on the Iris-Data and exports this as PMML and as an R model which is understood by the R Predictor node:

# load the library for learning a tree model
library(rpart);
# load the pmml export library
library(pmml);
# use class column as predicted column to build decision tree
dt <- rpart(class~., R)
# export to PMML
r_pmml <- pmml(dt)
# write the PMML model to an export file
write(toString(r_pmml), file="C:/R.pmml")
# provide the native R model at the out-port
R<-dt

 

Zementis takes the total cost of ownership and total pain of creating scored models to something close to 1$ /hour thanks to using their proprietary ADAPA engine.

As mentioned before, Zementis is at the forefront of using Cloud Computing ( Amazon EC2 ) for open source analytics. Recently I came in contact with Michael Zeller for a business problem , and Mike being the gentleman he is not only helped me out but also agreed on an extensive and exclusive interview.(!)

image

Ajay- What are the traditional rivals to scoring solutions offered by you. How does ADAPA compare to each of them. Case Study- Assume I have 50000 leads daily on a Car buying website. How would ADAPA help me in scoring the model ( created say by KXEN or , R or,SAS, or SPSS).What would my approximate cost advantages be if I intend to mail say the top 5 deciles everyday.

Michael- Some of the traditional scoring solutions used today are based on SAS, in-database scoring like Oracle, MS SQL Server, or very often even custom code.  ADAPA is able to import the models from all tools that support the PMML standard, so any of the above tools, open source or commercial, could serve as an excellent development environment.

The key differentiators for ADAPA are simple and focus on cost-effective deployment:

1) Open Standards – PMML & SOA:

Freedom to select best-of-breed development tools without being locked into a specific vendor;  integrate easily with other systems.

2) SaaS-based Cloud Computing:

Delivers a quantum leap in cost-effectiveness without compromising on scalability.

In your example, I assume that you’d be able to score your 50,000 leads in one hour using one ADAPA engine on Amazon.  Therefore, you could choose to either spend US$100,000 or more on hardware, software, maintenance, IT services, etc., write a project proposal, get it approved by management, and be ready to score your model in 6-12 months

OR, you could use ADAPA at something around US$1-$2 per day for the scenario above and get started today!  To get my point across here, I am of course simplifying the scenario a little bit, but in essence these are your choices.

Sounds too good to be true?  We often get this response, so please feel free to contact us today [http://www.zementis.com/contact.htm] and we will be happy show you how easy it can be to deploy predictive models with ADAPA!

 

Ajay- The ADAPA solution seems to save money on both hardware and software costs. Comment please. Also any benchmarking tests that you have done on a traditional scoring configuration system versus ADAPA.

Michael-Absolutely, the ADAPA Predictive Analytics Edition [http://www.zementis.com/predictive_analytics_edition.htm] on Amazon’s cloud computing infrastructure (Amazon EC2) eliminates the upfront investment in hardware and software.  It is a true Software as a Service (SaaS) offering on Amazon EC2 [http://www.zementis.com/howtobuy.htm] whereby users only pay for the actual machine time starting at less than US$1 per machine hour.  The ADAPA SaaS model is extremely dynamic, e.g., a user is able to select an instance type most appropriate for the job at hand (small, large, x-large) or launch one or even 100 instances within minutes.

In addition to the above savings in hardware/software, ADAPA also cuts the time-to-market for new models (priceless!) which adds to business agility, something truly critical for the current economic climate.

Regarding a benchmark comparison, it really depends on what is most important to the business.  Business agility, time-to-market, open standards for integration, or pure scoring performance?  ADAPA addresses all of the above.  At its core, it is a highly scalable scoring engine which is able to process thousands of transactions per second.  To tackle even the largest problems, it is easy to scale ADAPA via more CPUs, clustering, or parallel execution on multiple independent instances. 

Need to score lots of data once a month which would take 100 hours on one computer?  Simply launch 10 instances and complete the job in 10 hours over night.  No extra software licenses, no extra hardware to buy — that’s capacity truly on-demand, whenever needed, and cost-effective.

Ajay- What has been your vision for Zementis. What exciting products are we going to see from it next.

Michael – Our vision at Zementis [http://www.zementis.com] has been to make it easier for users to leverage analytics.  The primary focus of our products is on the deployment side, i.e., how to integrate predictive models into the business process and leverage them in real-time.  The complexity of deployment and the cost associated with it has been the main hurdle for a more widespread adoption of predictive analytics. 

Adhering to open standards like the Predictive Model Markup Language (PMML) [http://www.dmg.org/] and SOA-based integration, our ADAPA engine [http://www.zementis.com/products.htm] paves the way for new use cases of predictive analytics — wherever a painless, fast production deployment of models is critical or where the cost of real-time scoring has been prohibitive to date.

We will continue to contribute to the R/PMML export package [http://www.zementis.com/pmml_exporters.htm] and extend our free PMML converter [http://www.zementis.com/pmml_converters.htm] to support the adoption of the standard.  We believe that the analytics industry will benefit from open standards and we are just beginning to grasp what data-driven decision technology can do for us.  Without giving away much of our roadmap, please stay tuned for more exciting products that will make it easier for businesses to leverage the power of predictive analytics!

Ajay- Any India or Asia specific plans for the Zementis.

Michael-Zementis already serves customers in the Asia/Pacific region from its office in Hong Kong.  We expect rapid growth for predictive analytics in the region and we think our cost-effective SaaS solution on Amazon EC2 will be of great service to this market.  I could see various analytics outsourcing and consulting firms benefit from using ADAPA as their primary delivery mechanism to provide clients with predictive  models that are ready to be executed on-demand.

Ajay-What do you believe be the biggest challenges for analytics in 2009. What are the biggest opportunities.

Michael-The biggest challenge for analytics will most likely be the reduction in technology spending in a deep, global recession.  At the same time, companies must take advantage of analytics to cut cost, optimize processes, and to become more competitive.  Therefore, the biggest opportunity for analytics will be in the SaaS field, enabling clients to employ analytics without upfront capital expenditures.

Ajay – What made you choose a career in science. Describe your journey so far.What would your advice be to young science graduates in this recessionary times.

Michael- As a physicist, my research focused on neural networks and intelligent systems.  Predictive analytics is a great
way for me to stay close to science while applying such complex algorithms to solve real business problems.  Even in a recession, there is always a need for good people with the desire to excel in their profession.  Starting your career, I’d say the best way is to remain broad in expertise rather than being too specialized on one particular industry or proficient in a single analytics tool.  A good foundation of math and computer science, combined with curiosity in how to apply analytics to specific business problems will provide opportunities, even in the current economic climate.

About Zementis

Zementis, Inc. is a software company focused on predictive analytics and advanced Enterprise Decision Management technology. We combine science and software to create superior business imageand industrial solutions for our clients. Our scientific expertise includes statistical algorithms, machine learning, neural networks, and intelligent systems and our scientists have a proven record in producing effective predictive models to extract hidden patterns from a variety of data types. It is complemented by our product offering ADAPA, a decision engine framework for real-time execution of predictive models and rules. For more information please visit www.zementis.com

Ajay-If you have a lot of data ( GBs and GBs) , an existing model ( in SAS,SPSS,R) which you converted to PMML, and it is time for you to choose between spending more money to upgrade your hardware, renew your software licenses  then instead take a look at the ADAPA from www.zementis.com and score models as low as 1$ per hour. Check it out ( test and control !!)

Do you have any additional queries from Michael ? Use the comments page to ask.

How to do Logistic Regression

Logistic regression is a widely used technique in database marketing for creating scoring models and in risk classification . It helps develop propensity to buy, and propensity to default scores (and even propensity to fraud ) .

This is more of a practical approach to make the model than a theory based approach.(I was never good at the theory 😉 )

If you need to do Logistic Regression using SPSS, a very good tutorial ia available here

http://www2.chass.ncsu.edu/garson/PA765/logistic.htm

(Note -Copyright 1998, 2008 by G. David Garson.
Last update 5/21/08.)

For SAS a very good tutorial is here –

SAS Annotated Output
Ordered Logistic Regression. UCLA: Academic Technology Services, Statistical Consulting Group.

from http://www.ats.ucla.edu/stat/sas/output/sas_ologit_output.htm (accessed July 23, 2007).

For R the documentation (note :Still searching for R ‘s Logistic Regression ) is here
http://lib.stat.cmu.edu/S/Harrell/help/Design/html/lrm.html

lrm(formula, data, subset, na.action=na.delete, method=”lrm.fit”, model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE, penalty=0, penalty.matrix, tol=1e-7, strata.penalty=0, var.penalty=c(‘simple’,’sandwich’), weights, normwt, …)

For linear models in R –
http://datamining.togaware.com/survivor/Linear_Model0.html

An extremely good book if you want to work with R , and do not have time to learn it is to use the GUI
rattle and look at this book

http://datamining.togaware.com/survivor/Contents.html