Zementis News

From a Zementis Newsletter- interesting advances on the R on the cloud front. Thanks to Rom Ramos for sending this, and I hope Zementis and some one like Google/ Biocep team up so all I need to make a model is some data and a browser. 🙂

The R Journal – A Refereed Journal for the R Project Launches

As a sign of the open source R project for statistical computing gaining momentum, the R newsletter has been transformed into The R Journal, a refereed journal for articles covering topics that are of interest to users or developers of R.  As a supporter of the R PMML Package (see blog and video tutorial), we are honored that our article “PMML: An Open Standard for Sharing Models” which emphasizes the importance of the Predictive Model Markup Language (PMML) standard is part of the inaugural issue.  If you already develop your models in R, export them via PMML, then deploy and scale your models in ADAPA on the Amazon EC2 cloud. Read the full story.

Integrating Predictive Analytics via Web Services

Predictive analytics will deliver more value and become more pervasive across the enterprise, once we manage to seamlessly integrate predictive models into any business process.  In order to execute predictive models on-demand, in real-time or in batch mode, the integration via web services presents a simple and effective way to leverage scoring results within different applications.  For most scenarios, the best way to incorporate predictive models into the business process is as a decision service.  Query the model(s) daily, hourly, or in real-time, but if at all possible try to design a loosely coupled system following a Service Oriented Architecture (SOA).

Using web services, for example, one can quickly improve existing systems and processes by adding predictive decision models.  Following the idea of a loosely coupled architecture, it is even possible to use integration tools like Jitterbit or Microsoft SQL Service Integration Services (SSIS) to embed predictive mode ls that are deployed in ADAPA on the Amazon Elastic Compute Cloud without the need to write any code.  Of course, there is also the option to use custom Java code or MS SQL Server SSIS Scripting for which we provide a sample client application.  Read the full story.

About ADAPA®:

A fast real-time deployment environment for Predictive Analytics Models – a stand alone scoring engine that reads .xml based PMML descriptions of models and scores streams of data. Developed by Zementis – a fully hosted Software-as-a Service (SaaS) solution on the Amazon Elastic Computing Cloud.  It’s easy to use and remarkably inexpensive starting at only $0.99 per instance hour.

PMML 4.0

There are some nice changes in the PMML 4.0 version. PMML is the XML version for data modeling , or specificallyquoting the DMG group itself

PMML uses XML to represent mining models. The structure of the models is described by an XML Schema. One or more mining models can be contained in a PMML document. A PMML document is an XML document with a root element of type PMML. The general structure of a PMML document is:

  <?xml version="1.0"?>
  <PMML version="4.0"
    xmlns="http://www.dmg.org/PMML-4_0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >

    <Header copyright="Example.com"/>
    <DataDictionary> ... </DataDictionary>

    ... a model ...

  </PMML>

So what is new in version 4. Here are some powerful modeling changes. For anyone with any XML knowledge PMML is the way to go.

PMML 4.0 – Changes from PMML 3.2

Associations

  • Itemset and AssociationRule elements are no longer enclosed within a “Choice” element
  • Added different scoring procedures: recommendation, exclusiveRecommendation and ruleAssociation with explanation and example
  • Changed version to “4.0” from “3.2” in the example(s)

BuiltinFunctions

Added the following functions:
  • isMissing
  • isNotMissing
  • equal
  • notEqual
  • lessThan
  • lessOrEqual
  • greaterThan
  • greaterOrEqual
  • isIn
  • isNotIn
  • and
  • or
  • not
  • isIn
  • isNotIn
  • if

Click on Image for better resolution

ClusteringModel

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Conformance

  • Changed all version references from “3.2” to “4.0”

DataDictionary

  • No changes

Functions

  • No changes

GeneralRegression

  • Changed to allow for Cox survival models and model ensembles
    • Add new model type: CoxRegression.
    • Allow empty regression model when model type is CoxRegression, so that baseline-only model could be represented.
    • Add new optional model attributes: endTimeVariable, startTimeVariable, subjectIDVariable, statusVariable, baselineStrataVariable, modelDF.
    • Add optional Matrix in Predictor to specify a contrast matrix, optional attribute referencePoint in Parameter.
    • Add new elements: BaseCumHazardTables, EventValues, BaselineStratum, BaselineCell.
    • Add examples of scoring for Cox Regression and contrast matrices.
    • Add new type of distribution: tweedie.
    • Add new attribute in model: targetReferenceCategory, so that the model can be used in MiningModel.
    • Changed version to “4.0” from “3.2” in the example(s)
    • Added reference to ModelExplanation element in the model XSD

GeneralStructure

Header

  • No changes

Interoperability

  • Changed: “As a result, a new approach for interoperability was required and is being introduced in PMML version 3.2.” to “As a result, a new approach for interoperability was introduced in PMML version 3.2.”

MiningSchema

  • Added frequencyWeight and analysisWeight as new options for usageType. They will not affect scoring, but will make model information more complete.

ModelComposition — No longer used, replaced by MultipleModels

ModelExplanation

  • New addition to PMML 4.0 that contains information to explain the models, model fit statistics, and visualization information.

ModelVerification

  • No changes

MultipleModels

  • Replaces ModelComposition. Important additions are segmentation and ensembles.
  • Added reference to ModelExplanation element in the model XSD

NaïveBayes

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

NeuralNetwork

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Output

  • Extended output type to include Association rule models. The changes add a number of new attributes: “ruleFeature”, “algorithm”, “rank”, “rankBasis”, “rankOrder” and “isMultiValued”. A new enumeration type “ruleValue” is added to the RESULT-FEATURE

Regression

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

RuleSet

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Sequence

  • Changed version to “4.0” from “3.2” in the example(s)

Statistics

  • accommodate weighted counts by replacing INT-ARRAY with NUM-ARRAY in DiscrStats and ContStats
  • change xs:nonNegativeInteger to xs:double in several places
  • add new boolean attribute ‘weighted’ to UnivariateStats and PartitionFieldStats elements
  • add new attribute cardinality in Counts
  • Also some very long lines in this document are now wrapped.

SupportVectorMachine

  • Added optional attribute threshold
  • Added optional attribute classificationMethod
  • Attribute alternateTargetCategory removed from SupportVectorMachineModel element and moved to SupportVectorMachine element
  • Changed the example slightly
  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Targets

  • No changes

Taxonomy

  • Changed: “A TableLocator may contain any description which helps an application to locate a certain table. PMML 3.2 does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.” to “A TableLocator may contain any description which helps an application to locate a certain table. PMML standard does not yet define the content. PMML users have to use their own extensions. The same applies to InlineTable.”

Text

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

TimeSeriesModel

  • New addition to PMML 4.0 to support Time series models

Transformations

  • No changes

TreeModel

  • Changed version to “4.0” from “3.2” in the example(s)
  • Added reference to ModelExplanation element in the model XSD

Sources

http://www.dmg.org/v4-0/GeneralStructure.html

http://www.dmg.org/v4-0/Changes.html

and here are some companies using PMML already

http://www.dmg.org/products.html

I found the tool at http://www.dmg.org/coverage/ much more interesting though (see screenshot).

Screenshot-Mozilla Firefox

Zementis who we have covered in the interviews has played a steller role in bring together this common standard for data mining. Note Kxen model is also highlighted there.

The best PMML convertor tutorial is here

http://www.zementis.com/videos/PMML_Converter_iGoogle_gadget_2_demo.htm

Decisionstats Interviews

Here is a list of interviews that I have published- these are specific to analytics and data mining and include only the most recent interviews. If I have missed out any notable recent interview related to analytics and data mining, kindly do let me know. Hat Tip to Karl Rexer, for this suggestion .

Date    Name of Interviewee    Designation and Organization

09-Jun    Karl Rexer                          President, Rexer Analytics
05-Jun    Jim Daves                          CMO, SAS Institute
04-Jun    Paul van Eikeren                 President and CEO, Blue Reference
29-May    David Smith                      Director of Community, REvolution Computing
17-May    Dominic Pouzin                 CEO, Data Applied
11-May    Bruno Delahaye                 VP, KXEN
04-May    Ron Ramos                        Director, Zementis
30-Apr    Oliver Jouve                       VP, SPSS Inc
21-Apr    Fabian Dill                         Co- Founder, Knime.com
18-Apr    Alicia Mcgreevey                 Head Marketing, Visual Numerics
27-Mar    Francoise Soulie Fogelman    VP, KXEN
17-Mar    Jon Peck                            Principal Software Engineer, SPSS Inc
06-Mar    Anne Milley                        Director of product marketing, SAS Institute
04-Mar    Anne Milley                        Director of product marketing, SAS Institute
03-Feb    Phil Rack                            Creator, Bridge to R,and CEO Minequest
03-Feb    Michael Zeller                     CEO, Zementis
31-Jan    Richard Schultz                   CEO, Revolution Computing
21-Jan    Bob Muenchen                    Author, R for SAS and SPSS Users
13-Jan    Dr Graham Williams           Creator, Rattle GUI for R
05-Jan    Roger Haddad                    CEO, KXEN
26-Sep    June Dershewitz                  VP, Semphonic
04-Sep    Vincent Granville                 Head, Analyticbridge

The URl’s to specific interviews are also in this sheet.

http://spreadsheets.google.com/pub?key=rWTqcMe9mqwHeFv1e4GS_yg&single=true&gid=0&range=a1%3Ae24&output=html

Interview Ron Ramos, Zementis

 HeadShot Here is an interview with Ron Ramos, Director , Zementis. Ron Ramos wants to use put predictions for the desktop and servers to the remote  cloud using Zementis ADAPA scoring solution. I have tested the ADAPA solution myself and made some suggestions on tutorials. Zementis is a terrific company with a great product ADAPA and big early mover advantage ( see http://www.decisionstats.com/?s=zementis for the Zementis 5 minute video and earlier interview a few months back with Michael Zeller, a friend, and CEO of Zementis. )

Ajay- Describe your career journey. How would you motivate your children or young people to follow careers in science or at least to pay more attention to science subjects. What advice would you give to young tech entrepreneurs in this recession- the ones chasing dreams on iMobile Applications, cloud computing etc.

Ron- Science and a curious mind go together. I remember when I first met a friend of mine who is a professor of cognitive sciences at the University of California. To me, he represents the quest for scientific knowledge. Not only has he been studying visual space perception, visual control of locomotion, and spatial cognition, but he is also interested in every single aspect of the world around him. I believe that if we are genuinely interested and curious to know how and why things are the way they are, we are a step closer into appreciating and willing to participate in the collective quest for scientific knowledge.

Our current economic troubles are not affecting a single industry. The problem is widespread. So, tech entrepreneurs should not view this recession as target towards technology. It is new technology in clean, renewable fuels which will most probably define what is to come. I am also old enough to know that everything is cyclical and so, this recession will lead us to great progress. iMobile Applications and Cloud Computing are here to stay since these are technologies that just make sense. Cloud Computing benefits from the pay-as-you-go model, which because of its affordability is bound to allow for the widespread use and availability of computing where we have not seen before.

The most interesting and satisfying effect one can have is transformation – do that which changes people’s lives, and your own at the same time.  I like the concept of doing well and doing good at the same time.  My emphasis has always marketing and sales in every business in which I have been involved.  ADAPA provides for delivering on the promise of predictive analytics – decisioning in real-time.

Ajay-  How do you think Cloud Computing will change the modeling deployment market by 2011. SAS Institute is also building a 70 million dollar facility for private clouds. Do you think private clouds with tied in applications would work.

Ron- Model deployment in the cloud is already a reality. By 2011, we project that most models will be deployed in the cloud (private or not). With time though, private clouds will most probably need to embrace the use of open standards such as PMML. I believe open standards such as PMML, which allows for true interoperability, will become widespread among the data mining community; be used in any kind of computing environment; and, be moved from cloud to cloud.

Ajay- I am curious- who is Zementis competition in cloud deployed models. Where is ADAPA deployment NOT suitable for scoring models – what break off point does size of data make people realize that cloud is better than server. Do you think Internal Organization IT Support teams fear cloud vendors would take their power away.

Ron- Zementis is the first and only company to provide a scoring engine on the cloud. Other data mining companies have announced their intention to move to cloud computing environments. The size of the data you need to score is not something that should be taken into account for determining if scoring should be done in the cloud or not. In ADAPA, models can uploaded and managed through an intuitive web console and all virtual machines can be launched or terminated with the click of a mouse. Since ADAPA instances run from $0.99/hour, it can appeal to small and large scoring jobs. For small, the cost is minimal and deployment of models is fast. For large, the cloud offers scalability. Many ADAPA instances can be set to run at the same time.

 

Cloud computing is changing the way models are deployed, but all organizations still need to manage their data and so IT can concentrate on that. Scoring on the cloud makes the job of IT easier.

Ajay- Which is a case where ADAPA deployment is not suited. Software like from KXEN offers model export into many formats like PMML, SQL, C++ , SAS etc. Do you think Zementis would be benefited if it had such a converter like utility/collection of utilities on its site for the PMML conversion say from SAS code to PMML code etc. Do you think PMML is here to stay for a long time.

Ron- Yes, PMML is here to stay. Version 4.0 is about to be release. So, this is a very mature standard embraced by all leading data mining vendors. I believe the entire community will benefit from having converters to PMML, since it allows for models to be represented by an open and well documented standard. Also, since different tools already import and export PMML, data miners and modelers are the set free to move their models around. True interoperability!

Ajay – Name some specific customer success stories and costs saved.

Ron – As a team, we spent our early development time working on assignments in the mortgage business.  That’s what gave rise to the concept of ADAPA – enabling smart decisions as an integral part of the overall business strategy.  It became obvious to us that we were in fact totally horizontal with application in any industry that had knowledge to be gained from its data.  If only they could put their artful predictive models to work – easily integrated and deployed, able to be invoked directly from the business’ applications using web services, with returned results downloaded for further processing and visualization.  There is no expensive upfront investment in software licenses and hardware; no long-term extended implementation and time-to-production.  The savings are obvious, the ROI pyrotechnic.

Our current users, both enterprise installations and Amazon EC2 subscribers report great results, and for a variety of good reasons we tend to respect their anonymity:

Zementis ADAPA Case Study #1:

Financial Institution Embraces Real-time Decisions.

Decision Management:  A leading financial company wanted to implement an enterprise-wide decision system to automate credit decisions across Retail, Wholesale, and Correspondent business channels. A key requirement for the companys Enterprise strategy was to select a solution which could execute and manage rules as well as predictive analytic
s on demand and in real-time. With minimal prior automation in place, the challenge was to execute guidelines and pricing for a variety of business scenarios. Complex underwriting and intricate pricing matrices combined present obstacles for employees and customers in correctly assessing available choices from a myriad of financial products. Although embracing a new processing paradigm, the goal for integration of the solution with the existing infrastructure also was to ensure minimal impact to already established processes and to not jeopardize origination volume.

Following a comprehensive market review, the financial institution selected the Zementis ADAPA Enterprise Edition because of its key benefits as a highly scalable decision engine based on open standards. The ADAPA framework, they concluded, ensures real-time execution capabilities for rules and predictive analytics across all products and all business channels.

Working directly with senior business and IT management, Zementis efficiently executed on an iterative deployment strategy which enabled the joint project team to roll out a comprehensive Retail solution in less than three months. Accessed in Retail offices across the country, the ADAPA decision engine assists more than 700 loan officers to determine eligibility of a borrower with the system instantly displaying conditions or exceptions to guidelines as well as precise pricing for each scenario. The Wholesale division exposes the ADAPA decision engine to a large network of several thousand independent brokers who explore scenarios and submit their applications online. While rules were authored in Excel format, a favorite of many business users, predictive models were developed in various analytics tools and deployed in ADAPA via the Predictive Model Markup Language (PMML) standard. Extending its value across the entire enterprise, ADAPA emerged as the central decision hub for vital credit, risk, pricing, and other operational decisions.

Zementis ADAPA Case Study #2:

Delivering Predictive Analytics in the Cloud.

A specialized consulting firm with a focus on predictive analytics needed a cost-effective, agile deployment framework to deliver predictive models to their clients.  The firm specializes in outsourcing the development of predictive models for their clients, using various tools like R, SAS, and SPSS. Supporting open standards, the natural choice was to utilize the Predictive Model Markup Language (PMML) to transfer the models from the scientists development environment to a deployment infrastructure.  One key benefit of PMML is to remain development tool agnostic.  The firm selected the Zementis ADAPA Predictive Analytics Edition on the Amazon Elastic Compute Cloud (Amazon EC2) which provides a scalable, reliable deployment platform based on the PMML standard and Service Oriented Architecture (SOA).

With ADAPA, the firm was able to shorten the time-to-market for new models delivered to clients from months to just a few hours.  In addition, ADAPA enables their clients to benefit from a cost-effective SaaS utility-model, whereby the Zementis ADAPA engine is available on-demand at a fraction of the cost of traditional software licenses, eliminating upfront capital expenditures in both hardware and software. The ADAPA Predictive Analytics Edition has given the firm a highly competitive model delivery process and its clients an unprecedented agility in the deployment and integration of predictive analytics in their business processes.

Zementis ADAPA Case Study #3:

Assessing Risk in Real-Time for On-Line Merchant.

An on-line merchant with millions of customers needed to assess risk for submitted transactions before being sent to a credit-card processor.  Following a comprehensive data analysis phase, several models addressing specific data segments were built in a well-know model development platform.  Once model development is complete, models are exported in the PMML (Predictive Model Markup Language) standard. The deployment solution is the ADAPA Enterprise Edition, using its capabilities for data segmentation, data transformation, and model execution. ADAPA was selected as the optimal choice for deployment, not only because PMML-based models can easily be uploaded and are available for execution in seconds, but also because ADAPA Enterprise edition offers the seamless integration of rules and predictive analytics within a single Enterprise Decision Management solution.

ADAPA was deployed on-site and configured to handle high-volume, mission-critical transactions.  The firm not only leveraged the real-time capabilities of ADAPA, but also its integrated reporting framework.  It was very important for the merchant to assess model impact on credit card transactions on a daily basis. Given that ADAPA allows for reports to be uploaded and managed via its web administration console, the reporting team was able to design new reports, schedule them for routine execution, and send the results in PDF format for analysis to the business department with the required agility. During the implementation of the roll-out strategy, the ADAPA web console and its ease of use allowed for effective management of rules and models as well as active monitoring of deployed models and impact of decisions on the business operation.

 

For More on Zementis see here www.zementis.com

Using Web 2.0 for Analytics 2.0

Here is a great video tutorial on You Tube by Zementis, creator of ADAPA,the cloud scoring engine for next gen predictive analytics. You can watch it on the URL or below-

http://www.youtube.com/watch?v=8hNqxqrdXLI

 

A few weeks back, I was working with the ADAPA engine on a consulting gig, and Ron Ramos, the head of sales mentioned that though they have extensive documentation, they were planning a video tutorial as well on You Tube.

Beats a pdf everytime , doesnt it !!!

I wonder why companies continue to spend huge and I mean huge amounts on white papers and PDFs when they can have much better customer support using a bit of audio, video and even twitter support.

Surprisingly true even for companies working at the cutting edge with other technologies.And the essentially free availability of these tools.

 

I mean if companies can spend huge amounts for predictive solutions for the big big datasets , why cant they offer some solutions or apps for the web and social media- An exception is KXEN of course with a new Social Network Analysis Module here ).

Imagine a future –

( Example

  • Hello SAS , My code wont run blah blah blah

SAS Support on Twitter..okay do this

or

  • Hello SPSS, Where Can I find some stuff on Python because I got lost on the website
  • SPSS Support on Skype/Twitter- Dude , do this , click this link !

)

It is much better than endless rounds of email, aggravation and the list server method is well the users should try and test www.twitter.com for user groups )

KNIME and Zementis shake hands

Two very good and very customer centric (and open source ) companies shook hands on a strategic partnership today.

Knime  www.knime.org and Zementis www.zementis.com .

Decision Stats has been covering these companies and both the products are amazing good, synch in very well thanks to the support of the PMML standard and lower costs considerably for the consumer. (http://www.decisionstats.com/2009/02/knime/ ) and http://www.decisionstats.com/2009/02/interview-michael-zeller-ceozementis/ )

While Knime has both a free personal as well as a commercial license , it supports R thanks to the PMML (www.dmg.org initiative ). Knime also supports R very well .

See http://www.knime.org/blog/export-and-convert-r-models-pmml-within-knime

The following example R script learns a decision tree based on the Iris-Data and exports this as PMML and as an R model which is understood by the R Predictor node:

# load the library for learning a tree model
library(rpart);
# load the pmml export library
library(pmml);
# use class column as predicted column to build decision tree
dt <- rpart(class~., R)
# export to PMML
r_pmml <- pmml(dt)
# write the PMML model to an export file
write(toString(r_pmml), file="C:/R.pmml")
# provide the native R model at the out-port
R<-dt

 

Zementis takes the total cost of ownership and total pain of creating scored models to something close to 1$ /hour thanks to using their proprietary ADAPA engine.

%d bloggers like this: