Here is a topic specific interview with Micheal Zeller of Zementis on PMML, the de facto standard for data mining.
Ajay- What is PMML?
Mike- The Predictive Model Markup Language (PMML) is the leading standard for statistical and data mining models and supported by all leading analytics vendors and organizations. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application. PMML reduces complexity and bridges the gap between development and production deployment of predictive analytics.
PMML is governed by the Data Mining Group (DMG), an independent, vendor led consortium that develops data mining standards
Ajay- Why can PMML help any business?
Mike– PMML ensures business agility with respect to data mining, predictive analytics, and enterprise decision management. It provides one standard, one deployment process, across all applications, projects and business divisions. In this way, business stakeholders, analytic scientists, and IT are finally speaking the same language.
In the current global economic crisis more than ever, a company must become more efficient and optimize business processes to remain competitive. Predictive analytics is widely regarded as the next logical step, implementing more intelligent, real-time decisions across the enterprise.
However, the deployment of decisions based on predictive models and statistical algorithms has been a hurdle for many companies. Typically, it has been a complex, costly process to get such models integrated into operational systems. With the PMML standard, this no longer is the case. PMML simply eliminates the deployment complexity for predictive models.
A standard also provides choices among vendors, allowing us to implement best-of-breed solutions, and creating a common knowledge framework for internal teams – analytics, IT, and business – as well external vendors and consultants. In general, having a solid standard is a sign of a mature analytics industry, creating more options for users and, most importantly, propelling the total analytics market to the next level.
Ajay- Can PMML help your existing software in analytics and BI?
Mike- PMML has been widely accepted among vendors, almost all major analytics and business intelligence vendors already support the standard. If you have any such software package in-house, you most likely have PMML at your disposal already.
For example, you can develop your models in any of the tools that support PMML, e.g., SPSS, SAS, Microstrategy, or IBM, and then deploy that model in ADAPA, which is the Zementis decision engine. Or you can even choose from various open source tools, like R and KNIME.
Ajay- How does Zementis and ADAPA and PMML fit?
Mike- Zementis has been a avid supporter of the PMML standard and is very active in the development of the standard. We contributed to the PMML package for the open source R Project. Furthermore, we created a free PMML Converter tool which helps users to validate and correct PMML files from various vendors and convert legacy PMML files to the latest version of the standard.
Most prominently with ADAPA, Zementis launched the first cloud-computing scoring engine on the Amazon EC2 cloud. ADAPA is a highly scalable deployment, integration and execution platform for PMML-based predictive models. Not only does it give you all the benefits of being fully standards-based, using PMML and web services, but it also leverages the cloud for scalability and cost-effectiveness.
By being a Software as a Service (SaaS) application on Amazon EC2, ADAPA provides extreme flexibility, from casual usage which only costs a few dollars a month all the way to high-volume mission critical enterprise decision management which users can seamlessly launch in the United States or in European data centers.
Ajay- What are some examples where PMML helped companies save money?
Mike- For any consulting company focused on developing predictive analytics models for clients, PMML provides tremendous benefits, both for clients and service provider. In standardizing on PMML, it defines a clear deliverable – a PMML model – which clients can deploy instantly. No fixed requirements on which specific tools to choose for development or deployment, it is only important that the model adheres to the PMML standard which becomes the common interface between the business partners. This eliminates miscommunication and lowers the overall project cost. Another example is where a company has taken advantage of the capability to move models instantly from development to operational deployment. It allows them to quickly update models based on market conditions, say in the area of risk management and fraud detection, or to roll out new marketing campaigns.
Personally, I think the biggest opportunities are still ahead of us as more and more businesses embrace operational predictive analytics. The true value of PMML is to facilitate a real-time decision environment where we leverage predictive models in every business process, at every customer touch point and on-demand to maximize value
Ajay- Where can I find more information about PMML?
Mike- First there is the Data Mining Group (DMG) web site at http://www.dmg.org
I strongly encourage any company that has a significant interest in predictive analytics to become a member and help drive the development of the standard.
We also created a knowledge base of PMML-related information at http://www.predictive-analytics.info and there is a PMML interest group on Linked
In http://www.linkedin.com/groupRegistration?gid=2328634
This group is more geared toward a general discussion forum for business benefits and end-user questions, and it is a great way to get started with PMML.
Last but not least, the Zementis web site at http://www.zementis.com
It contains various PMML example files, the PMML Converter tool, as well links to PMML resource pages on the web.
For more on Michael Zeller and Zementis read his earlier interview at https://decisionstats.wordpress.com/2009/02/03/interview-michael-zeller-ceozementis-2/