#Rstats gets into Enterprise Cloud Software

Defense Agencies of the United States Departme...
Image via Wikipedia

Here is an excellent example of how websites should help rather than hinder new customers take a demo of the software without being overwhelmed by sweet talking marketing guys who dont know the difference between heteroskedasticity, probability, odds and likelihood.

It is made by Zementis (Dr Michael Zeller has been a frequent guest here) and Revolution Analytics is still the best shot in Enterprise software for #Rstats

Now if only Revo could get into the lucrative Department of Energy or Department of Defense business- they could change the world AND earn some more revenue than they have been doing. But seriously.

Check out http://deployr.revolutionanalytics.com/zementis/ and play with it. or better still mash it with some data viz and ROC curves.- or extend it with some APIS 😉

Augustus- a PMML model producer and consumer. Scoring engine.

A Bold GNU Head
Image via Wikipedia

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Recent News

  • Augustus v 0.4.3.1 has been released
  • Added a guide (pdf) for including Augustus in the Windows System Properties.
  • Updated the install documentation.
  • Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
  • Added performance discussion concerning the optional cyclic garbage collection.

See Recent News for more details and all recent news.

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.


Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) ModelsTree ModelsRegression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

  1. Identify suitable data with which to construct a new model.
  2. Provide a model schema which proscribes the requirements for the model.
  3. Run the Augustus producer to obtain a new model.
  4. Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

  • Re-normalize the scoring results or performing an additional transformation.
  • Supplements the results with global meta-data such as timestamps.
  • Formatting of the results.
  • Select certain interesting values from the results.
  • Restructure the data for use with other applications.

PMML Plugin for Greenplum now available

Predictive Model Markup Language
Image via Wikipedia

From a press release from Zementis.

 

, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.

Universal PMML Plug-in

Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.

“By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment,” said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. “With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today.”

Want to learn more?
 

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:

  1. Visit the PMML Plug-in product page
  2. Download the white paper

The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Michael Zeller, CEO, Zementis

 

 

Zementis partners with R Analytics Vendor- Revo

Logo for R
Image via Wikipedia

Just got a  PR email from Michael Zeller,CEO , Zementis annoucing Zementis (ADAPA) and Revolution  Analytics just partnered up.

Is this something substantial or just time-sharing http://bi.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal or a Barney Partnership (http://www.dbms2.com/2008/05/08/database-blades-are-not-what-they-used-to-be/)

Summary- Thats cloud computing scoring of models on EC2 (Zementis) partnering with the actual modeling software in R (Revolution Analytics RevoDeployR)

See previous interviews with both Dr Zeller at https://decisionstats.com/2009/02/03/interview-michael-zeller-ceozementis/ ,https://decisionstats.com/2009/05/07/interview-ron-ramos-zementis/ and https://decisionstats.com/2009/10/05/interview-michael-zellerceo-zementis-on-pmml/)

and Revolution guys at https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

and https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

strategic partnership with Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language. With this partnership, predictive models developed on Revolution R Enterprise are now accessible for real-time scoring through the ADAPA Decisioning Engine by Zementis. 

ADAPA is an extremely fast and scalable predictive platform. Models deployed in ADAPA are automatically available for execution in real-time and batch-mode as Web Services. ADAPA allows Revolution R Enterprise to leverage the Predictive Model Markup Language (PMML) for better decision management. With PMML, models built in R can be used in a wide variety of real-world scenarios without requiring laborious or expensive proprietary processes to convert them into applications capable of running on an execution system.

partnership

“By partnering with Zementis, Revolution Analytics is building an end-to-end solution for moving enterprise-level predictive R models into the execution environment,” said Jeff Erhardt, Revolution Analytics Chief Operation Officer. “With Zementis, we are eliminating the need to take R applications apart and recode, retest and redeploy them in order to obtain desirable results.”

 

Got demo? 

Yes, we do! Revolution Analytics and Zementis have put together a demo which combines the building of models in R with automatic deployment and execution in ADAPA. It uses Revolution Analytics’ RevoDeployR, a new Web Services framework that allows for data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise.

Action Items:

  1. Try our INTERACTIVE DEMO
  2. DOWNLOAD the white paper
  3. Try the ADAPA FREE TRIAL

RevoDeployR & ADAPA allow for real-time analysis and predictions from R to be effectively used by existing Excel spreadsheets, BI dashboards and Web-based applications, all in real-time.

RevoADAPAPredictive analytics with RevoDeployR from Revolution Analytics and ADAPA from Zementis put model building and real-time scoring into a league of their own. Seriously!

Scoring SAS and SPSS Models in the cloud

Outline of a cloud containing text 'The Cloud'
Image via Wikipedia

An announcement from Zementis and Predixion Software– about using cloud computing for scoring models using PMML. Note R has a PMML package as well which is used by Rattle, data mining GUI for exporting models.

Source- http://www.marketwatch.com/story/predixion-software-introduces-new-product-to-run-sas-and-spss-predictive-models-in-the-cloud-2010-10-19?reflink=MW_news_stmp

——————————————————————————————————–

ALISO VIEJO, Calif., Oct 19, 2010 (BUSINESS WIRE) — Predixion Software today introduced Predixion PMML Connexion(TM), an interface that provides Predixion Insight(TM), the company’s low-cost, self-service in the cloud predictive analytics solution, direct and seamless access to SAS, SPSS (IBM) and other predictive models for use by Predixion Insight customers. Predixion PMML Connexion enables companies to leverage their significant investments in legacy predictive analytics solutions at a fraction of the cost of conventional licensing and maintenance fees.

The announcement was made at the Predictive Analytics World conference in Washington, D.C. where Predixion also announced a strategic partnership with Zementis, Inc., a market leader in PMML-based solutions. Zementis is exhibiting in Booth #P2.

The Predictive Model Markup Language (PMML) standard allows for true interoperability, offering a mature standard for moving predictive models seamlessly between platforms. Predixion has fully integrated this PMML functionality into Predixion Insight, meaning Predixion Insight users can now effortlessly import PMML-based predictive models, enabling information workers to score the models in the cloud from anywhere and publish reports using Microsoft Excel(R) and SharePoint(R). In addition, models can also be written back into SAS, SPSS and other platforms for a truly collaborative, interoperable solution.

“Predixion’s investment in this PMML interface makes perfect business sense as the lion’s share of the models in existence today are created by the SAS and SPSS platforms, creating compelling opportunity to leverage existing investments in predictive and statistical models on a low-cost cloud predictive analytics platform that can be fed with enterprise, line of business and cloud-based data,” said Mike Ferguson, CEO of Intelligent Business Strategies, a leading analyst and consulting firm specializing in the areas of business intelligence and enterprise business integration. “In this economy, Predixion’s low-cost, self-service predictive analytics solutions might be welcome relief to IT organizations chartered with quickly adding additional applications while at the same time cutting costs and staffing.”

“We are pleased to be partnering with Zementis, truly a PMML market leader and innovator,” said Predixion CEO Simon Arkell. “To allow any SAS or SPSS customer to immediately score any of their predictive models in the cloud from within Predixion Insight, compare those models to those created by Predixion Insight, and share the results within Excel and Sharepoint is an exciting step forward for the industry. SAS and SPSS customers are fed up with the high prices they must pay for their business users just to access reports generated by highly skilled PhDs who are burdened by performing routine tasks and thus have become a massive bottleneck. That frustration is now a thing of the past because any information worker can now unlock the power of predictive analytics without relying on experts — for a fraction of the cost and from anywhere they can connect to the cloud,” Arkell said.

Dr. Michael Zeller, Zementis CEO, added, “Our mission is to significantly shorten the time-to-market for predictive models in any industry. We are excited to be contributing to Predixion’s self-service, cloud-based predictive analytics solution set.”

About Predixion Software

Predixion Software develops and markets collaborative predictive analytics solutions in the public and private cloud. Predixion enables self-service predictive analytics, allowing customers to use and analyze large amounts of data to make actionable decisions, all within the familiar environment of Excel and PowerPivot. Predixion customers are achieving immediate results across a multitude of industries including: retail, finance, healthcare, marketing, telecommunications and insurance/risk management.

Predixion Software is headquartered in Aliso Viejo, California with development offices in Redmond, Washington. The company has venture capital backing from established investors including DFJ Frontier, Miramar Venture Partners and Palomar Ventures. For more information please contact us at 949-330-6540, or visit us atwww.predixionsoftware.com.

About Zementis

Zementis, Inc. is a leading software company focused on the operational deployment and integration of predictive analytics and data mining solutions. Its ADAPA(R) decision engine successfully bridges the gap between science and engineering. ADAPA(R) was designed from the ground up to benefit from open standards and to significantly shorten the time-to-market for predictive models in any industry. For more information, please visit www.zementis.com.

 

Event: Predictive analytics with R, PMML and ADAPA

From http://www.meetup.com/R-Users/calendar/14405407/

The September meeting is at the Oracle campus. (This is next door to the Oracle towers, so there is plenty of free parking.) The featured talk is from Alex Guazzelli (Vice President – Analytics, Zementis Inc.) who will talk about “Predictive analytics with R, PMML and ADAPA”.

Agenda:
* 6:15 – 7:00 Networking and Pizza (with thanks to Revolution Analytics)
* 7:00 – 8:00 Talk: Predictive analytics with R, PMML and ADAPA
* 8:00 – 8:30 General discussion

Talk overview:

The rule in the past was that whenever a model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Once exported as PMML files, models are readily available for deployment into an execution engine for scoring or classification. ADAPA is one example of such an engine. It takes in models expressed in PMML and transforms them into web-services. Models can be executed either remotely by using web-services calls, or via a web console. Users can also use an Excel add-in to score data from inside Excel using models built in R.

R models have been exported into PMML and uploaded in ADAPA for many different purposes. Use cases where clients have used the flexibility of R to develop and the PMML standard combined with ADAPA to deploy range from financial applications (e.g., risk, compliance, fraud) to energy applications for the smart grid. The ability to easily transition solutions developed in R to the operational IT production environment helps eliminate the traditional limitations of R, e.g. performance for high volume or real-time transactional systems and memory constraints associated with large data sets.

Speaker Bio:

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language which is the de facto standard used to represent predictive models. The book, entitled PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, is available on Amazon.com. As the Vice President of Analytics at Zementis, Inc., Dr. Guazzelli is responsible for developing core technology and analytical solutions under ADAPA, a PMML-based predictive decisioning platform that combines predictive analytics and business rules. ADAPA is the first system of its kind to be offered as a service on the cloud.
Prior to joining Zementis, Dr. Guazzelli was involved in not only building but also deploying predictive solutions for large financial and telecommunication institutions around the globe. In academia, Dr. Guazzelli worked with data mining, neural networks, expert systems and brain theory. His work in brain theory and computational neuroscience has appeared in many peer reviewed publications. At Zementis, Dr. Guazzelli and his team have been involved in a myriad of modeling projects for financial, health-care, gaming, chemical, and manufacturing industries.

Dr. Guazzelli holds a Ph.D. in Computer Science from the University of Southern California and a M.S and B.S. in Computer Science from the Federal University of Rio Grande do Sul, Brazil.

Not just a Cloud

While browsing the rather content heavy site of Oracle, I came across this interesting white paper on cloud computing.

Platform-as-a-Service Private Cloud with Oracle Fusion Middleware

at http://www.oracle.com/us/technologies/036500.pdf

It basically says that Oracle has the following offerings for PaaS-

  • Application grid
  • Oracle SOA Suite and Oracle Business Process Management Suite
  • Oracle WebCenter Suite
  • Oracle Identity Management

Here is why traditional software licensing model can be threatened by Cloud Computing. These are very basic and conservative costs. If you have a software budget you can run the numbers yourself.

Suppose you pay $10,000 for an annual license and say an extra $5,000 for hardware costs for it.Assume you are using in house resources (employees) which cost you another $50,000/year.

The per hour cost of this very basic resource is Total Cost/ Number of hours utilized.

Assuming a 100 % utilization at work hours ( which is not possible) but still .

That’s a 40 hour week * 48 weeks ( including holidays).

or 33.85 $ per hour.

That’s the cut off point for you deciding to offshore work to contractors or outsourcing.

Assuming say a more realistic 80% utilization the per hour cost is= $42.31/hour.

Now assume we cant outsource because of data hygiene or some reason- so we take the same people costs/ exclude them and calculate only the total cost of ownership ( software and hardware).

thats $15,000 per 0.8 per 40*48 hours.

That’s still an astonishing 9.76 $ per hour.

Compare this cost with the cost of running a virtual instance of R on an Amazon Ec2.

Eg. http://biocep-distrib.r-forge.r-project.org/

or using http://www.zementis.com (which is now introducing an Excel add in as well at http://www.zementis.com/Excel-Ai.htm)

The per hour costs are not going to be more than 3.5 $ per hour. Thats much much better than ANY stats software licensed today on ANY desktop /Server configuration.

See the math. Thats why cloud is much more than time sharing, Dr G 😉

First of all, I don’t see anything greatly new and wonderful and different about cloud computing. It was timesharing way back in ’60. It’s not a whole lot different. I certainly have issues asking a bank to send us all their data and we’re going to put it up on a cloud. They’re going to say, ‘What about security? How will I know who else is up there in that cloud?’ I don’t know, it’s just a cloud.-

Dr Jim Goodnight, SAS Institute.