Predictive Models Ain’t Easy to Deploy

 

This is a guest blog post by Carole Ann Matignon of Sparkling Logic. You can see more on Sparkling Logic at http://my.sparklinglogic.com/

Decision Management is about combining predictive models and business rules to automate decisions for your business. Insurance underwriting, loan origination or workout, claims processing are all very good use cases for that discipline… But there is a hiccup… It ain’t as easy you would expect…

What’s easy?

If you have a neat model, then most tools would allow you to export it as a PMML model – PMML stands for Predictive Model Markup Language and is a standard XML representation for predictive model formulas. Many model development tools let you export it without much effort. Many BRMS – Business rules Management Systems – let you import it. Tada… The model is ready for deployment.

What’s hard?

The problem that we keep seeing over and over in the industry is the issue around variables.

Those neat predictive models are formulas based on variables that may or may not exist as is in your object model. When the variable is itself a formula based on the object model, like the min, max or sum of Dollar amount spent in Groceries in the past 3 months, and the object model comes with transaction details, such that you can compute it by iterating through those transactions, then the problem is not “that” big. PMML 4 introduced some support for those variables.

The issue that is not easy to fix, and yet quite frequent, is when the model development data model does not resemble the operational one. Your Data Warehouse very likely flattened the object model, and pre-computed some aggregations that make the mapping very hard to restore.

It is clearly not an impossible project as many organizations do that today. It comes with a significant overhead though that forces modelers to involve IT resources to extract the right data for the model to be operationalized. It is a heavy process that is well justified for heavy-duty models that were developed over a period of time, with a significant ROI.

This is a show-stopper though for other initiatives which do not have the same ROI, or would require too frequent model refresh to be viable. Here, I refer to “real” model refresh that involves a model reengineering, not just a re-weighting of the same variables.

For those initiatives where time is of the essence, the challenge will be to bring closer those two worlds, the modelers and the business rules experts, in order to streamline the development AND deployment of analytics beyond the model formula. The great opportunity I see is the potential for a better and coordinated tuning of the cut-off rules in the context of the model refinement. In other words: the opportunity to refine the strategy as a whole. Very ambitious? I don’t think so.

About Carole Ann Matignon

http://my.sparklinglogic.com/index.php/company/management-team

Carole-Ann Matignon Print E-mail

Carole-Ann MatignonCarole-Ann Matignon – Co-Founder, President & Chief Executive Officer

She is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry.  Her claim to fame is managing the strategy and direction of Blaze Advisor, the leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience.  She speaks often at Industry conferences and has conducted university classes in France and Washington DC.

She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication. At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs around change management.

While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM). She developed a growing interest in Optimization as well as Business Rules. At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart. She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).

Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business. Her technical background kept her very much in touch with technology as she advanced.

PMML Augustus

Here is a new-old system in open source for

for building and scoring statistical models designed to work with data sets that are too large to fit into memory.

http://code.google.com/p/augustus/

Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.

New to PMML ?

Read on http://code.google.com/p/augustus/wiki/PMML

The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that

Continue reading

Text Analytics World in New York

There is a 15 % discount if you want to register for Text Analytics World next month-

Use Discount Code AJAYNY11

October 19-20, 2011 at The Hilton New York

http://www.textanalyticsworld.com/newyork/2011

Text Analytics World Topics & Case Studies - Oct 19-20 in NYC

Text Analytics World NYC (tawgo.com) is the business-focused event for text analytics professionals,
managers and commercial practitioners. This conference delivers case studies, expertise and resources
to leverage unstructured data for business impact.
Text Analytics World NYC is packed with the top predictive analytics experts, practitioners, authors and
business thought leaders, including keynote addresses from Thomas Davenport, author of Competing
on Analytics: The New Science of Winning, David Gondek from IBM Research on their Jeopardy-Winning
Watson and DeepQA, and PAW Program Chair Eric Siegel, plus special sessions from industry heavy-
weights Usama Fayyad and John Elder.
CASE STUDIES:

TAW New York City will feature over 25 sessions with case studies from leading enterprises in
automotive, educational, e-commerce, financial services, government, high technology, insurance,
retail, social media, and telecom such as: Accident Fund, Amdocs, Bundle.com, Citibank, Florida State
College, Google, Intuit, MetLife, Mitchell1, PayPal, Snap-on, Socialmediatoday, Topsy, a Fortune 500
global technology company, plus special examples from U.S. government agencies DoD, DHS, and SSA.

HOT TOPICS:

TAW New York City's agenda covers hot topics and advanced methods such as churn risk detection,
customer service and call centers, decision support, document discovery, document filtering, financial
indicators from social media, fraud detection, government applications, insurance applications,
knowledge discovery, open question-answering, parallelized text analysis, risk profiling, sentiment
analysis, social media applications, survey analysis, topic discovery, and voice of the customer and other
innovative applications that benefit organizations in new and creative ways.

WORKSHOPS: TAW also features a full-day, hands-on text analytics workshop, plus several other pre-
and post-conference workshops in analytics that complement the core conference program. For more
info: www.tawgo.com/newyork/2011/analytics-workshops
For more information: tawgo.com
Download the conference preview:
http://www.textanalyticsworld.com/newyork/2011/preview

View the agenda at-a-glance: textanalyticsworld.com/newyork/2011/agenda

Register by September 2nd for Early Bird Rates (save up to $200):
textanalyticsworld.com/newyork/2011/registration
If you'd like our informative event updates, sign up at:
http://www.textanalyticsworld.com/subscription.php
To sign up for TAW group on LinkedIn:
www.linkedin.com/e/gis/3869759

For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495.

OTHER ANALYTICS EVENTS:
Predictive Analytics World for Government: Sept 12-13 in DC – www.pawgov.com
Predictive Analytics World New York City: Oct 16-21 – www.pawcon.com/nyc

Text Analytics World New York City: Oct 19-20 – www.tawgo.com/nyc
Predictive Analytics World London: Nov 30-Dec 1 – www.pawcon.com/london
Predictive Analytics World San Francisco: March 4-10, 2012 – www.pawcon.com/sanfrancisco
Predictive Analytics World Videos: Available on-demand – www.pawcon.com/video
Also has two sessions on R

Sunday, October 16, 2011


Half-day Workshop
Room: Madison

R Bootcamp
Click here for the detailed workshop description

  • Workshop starts at 1:00pm
  • Afternoon Coffee Break at 2:30pm – 3:00pm
  • End of the Workshop: 5:00pm

Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer

Top of this page ] [ Agenda overview ]

Monday, October 17, 2011


Full-day Workshop
Room: Madison

R for Predictive Modeling: A Hands-On Introduction
Click here for the detailed workshop description

  • Workshop starts at 9:00am
  • Morning Coffee Break at 10:30am – 11:00am
  • Lunch provided at 12:30 – 1:15pm
  • Afternoon Coffee Break at 2:30pm – 3:00pm
  • End of the Workshop: 4:30pm

Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer