Home » Posts tagged 'insurance'
Tag Archives: insurance
Predictive Models Ain’t Easy to Deploy
This is a guest blog post by Carole Ann Matignon of Sparkling Logic. You can see more on Sparkling Logic at http://my.sparklinglogic.com/
Decision Management is about combining predictive models and business rules to automate decisions for your business. Insurance underwriting, loan origination or workout, claims processing are all very good use cases for that discipline… But there is a hiccup… It ain’t as easy you would expect…
What’s easy?
If you have a neat model, then most tools would allow you to export it as a PMML model – PMML stands for Predictive Model Markup Language and is a standard XML representation for predictive model formulas. Many model development tools let you export it without much effort. Many BRMS – Business rules Management Systems – let you import it. Tada… The model is ready for deployment.
What’s hard?
The problem that we keep seeing over and over in the industry is the issue around variables.
Those neat predictive models are formulas based on variables that may or may not exist as is in your object model. When the variable is itself a formula based on the object model, like the min, max or sum of Dollar amount spent in Groceries in the past 3 months, and the object model comes with transaction details, such that you can compute it by iterating through those transactions, then the problem is not “that” big. PMML 4 introduced some support for those variables.
The issue that is not easy to fix, and yet quite frequent, is when the model development data model does not resemble the operational one. Your Data Warehouse very likely flattened the object model, and pre-computed some aggregations that make the mapping very hard to restore.
It is clearly not an impossible project as many organizations do that today. It comes with a significant overhead though that forces modelers to involve IT resources to extract the right data for the model to be operationalized. It is a heavy process that is well justified for heavy-duty models that were developed over a period of time, with a significant ROI.
This is a show-stopper though for other initiatives which do not have the same ROI, or would require too frequent model refresh to be viable. Here, I refer to “real” model refresh that involves a model reengineering, not just a re-weighting of the same variables.
For those initiatives where time is of the essence, the challenge will be to bring closer those two worlds, the modelers and the business rules experts, in order to streamline the development AND deployment of analytics beyond the model formula. The great opportunity I see is the potential for a better and coordinated tuning of the cut-off rules in the context of the model refinement. In other words: the opportunity to refine the strategy as a whole. Very ambitious? I don’t think so.
About Carole Ann Matignon
http://my.sparklinglogic.com/index.php/company/management-team
| Carole-Ann Matignon | ![]() |
![]() |
|
PMML Augustus
Here is a new-old system in open source for
for building and scoring statistical models designed to work with data sets that are too large to fit into memory.
http://code.google.com/p/augustus/
Augustus is an open source software toolkit for building and scoring statistical models. It is written in Python and its
most distinctive features are:
• Ability to be used on sets of big data; these are data sets that exceed either memory capacity or disk capacity, so
that existing solutions like R or SAS cannot be used. Augustus is also perfectly capable of handling problems
that can fit on one computer.
• PMML compliance and the ability to both:
– produce models with PMML-compliant formats (saved with extension .pmml).
– consume models from files with the PMML format.
Augustus has been tested and deployed on serveral operating systems. It is intended for developers who work in the
financial or insurance industry, information technology, or in the science and research communities.
Usage
Augustus produces and consumes Baseline, Cluster, Tree, and Ruleset models. Currently, it uses an event-based
approach to building Tree, Cluster and Ruleset models that is non-standard.
New to PMML ?
Read on http://code.google.com/p/augustus/wiki/PMML
The Predictive Model Markup Language or PMML is a vendor driven XML markup language for specifying statistical and data mining models. In other words, it is an XML language so that (more…)
Text Analytics World in New York
There is a 15 % discount if you want to register for Text Analytics World next month-
Use Discount Code AJAYNY11
October 19-20, 2011 at The Hilton New York
http://www.textanalyticsworld.com/newyork/2011
Text Analytics World Topics & Case Studies - Oct 19-20 in NYC Text Analytics World NYC (tawgo.com) is the business-focused event for text analytics professionals, managers and commercial practitioners. This conference delivers case studies, expertise and resources to leverage unstructured data for business impact. Text Analytics World NYC is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote addresses from Thomas Davenport, author of Competing on Analytics: The New Science of Winning, David Gondek from IBM Research on their Jeopardy-Winning Watson and DeepQA, and PAW Program Chair Eric Siegel, plus special sessions from industry heavy- weights Usama Fayyad and John Elder. CASE STUDIES: TAW New York City will feature over 25 sessions with case studies from leading enterprises in automotive, educational, e-commerce, financial services, government, high technology, insurance, retail, social media, and telecom such as: Accident Fund, Amdocs, Bundle.com, Citibank, Florida State College, Google, Intuit, MetLife, Mitchell1, PayPal, Snap-on, Socialmediatoday, Topsy, a Fortune 500 global technology company, plus special examples from U.S. government agencies DoD, DHS, and SSA. HOT TOPICS: TAW New York City's agenda covers hot topics and advanced methods such as churn risk detection, customer service and call centers, decision support, document discovery, document filtering, financial indicators from social media, fraud detection, government applications, insurance applications, knowledge discovery, open question-answering, parallelized text analysis, risk profiling, sentiment analysis, social media applications, survey analysis, topic discovery, and voice of the customer and other innovative applications that benefit organizations in new and creative ways. WORKSHOPS: TAW also features a full-day, hands-on text analytics workshop, plus several other pre- and post-conference workshops in analytics that complement the core conference program. For more info: www.tawgo.com/newyork/2011/analytics-workshops For more information: tawgo.com Download the conference preview: http://www.textanalyticsworld.com/newyork/2011/preview View the agenda at-a-glance: textanalyticsworld.com/newyork/2011/agenda Register by September 2nd for Early Bird Rates (save up to $200): textanalyticsworld.com/newyork/2011/registration If you'd like our informative event updates, sign up at: http://www.textanalyticsworld.com/subscription.php To sign up for TAW group on LinkedIn: www.linkedin.com/e/gis/3869759 For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495. OTHER ANALYTICS EVENTS: Predictive Analytics World for Government: Sept 12-13 in DC – www.pawgov.com Predictive Analytics World New York City: Oct 16-21 – www.pawcon.com/nyc Text Analytics World New York City: Oct 19-20 – www.tawgo.com/nyc Predictive Analytics World London: Nov 30-Dec 1 – www.pawcon.com/london Predictive Analytics World San Francisco: March 4-10, 2012 – www.pawcon.com/sanfrancisco Predictive Analytics World Videos: Available on-demand – www.pawcon.com/video
Also has two sessions on R
Sunday, October 16, 2011
Half-day Workshop
Room: Madison
R Bootcamp
Click here for the detailed workshop description
- Workshop starts at 1:00pm
- Afternoon Coffee Break at 2:30pm – 3:00pm
- End of the Workshop: 5:00pm
Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer
[ Top of this page ] [ Agenda overview ]
Monday, October 17, 2011
Full-day Workshop
Room: Madison
R for Predictive Modeling: A Hands-On Introduction
Click here for the detailed workshop description
- Workshop starts at 9:00am
- Morning Coffee Break at 10:30am – 11:00am
- Lunch provided at 12:30 – 1:15pm
- Afternoon Coffee Break at 2:30pm – 3:00pm
- End of the Workshop: 4:30pm
Instructor: Max Kuhn, Director, Nonclinical Statistics, Pfizer
#SAS 9.3 and #Rstats 2.13.1 Released
A bit early but the latest editions of both SAS and R were released last week.
SAS 9.3 is clearly a major release with multiple enhancements to make SAS both relevant and pertinent in enterprise software in the age of big data. Also many more R specific, JMP specific and partners like Teradata specific enhancements.
http://support.sas.com/software/93/index.html
Features
Data management
- Enhanced manageability for improved performance
- In-database processing (EL-T pushdown)
- Enhanced performance for loading oracle data
- New ET-L transforms
- Data access
Data quality
- SAS® Data Integration Server includes DataFlux® Data Management Platform for enhanced data quality
- Master Data Management (DataFlux® qMDM)
- Provides support for master hub of trusted entity data.
Analytics
- SAS® Enterprise Miner™
- New survival analysis predicts when an event will happen, not just if it will happen.
- New rate making capability for insurance predicts optimal insurance premium for individuals based on attributes known at application time.
- Time Series Data Mining node (experimental) applies data mining techniques to transactional, time-stamped data.
- Support Vector Machines node (experimental) provides a supervised machine learning method for prediction and classification.
- SAS® Forecast Server
- SAS Forecast Server is integrated with the SAP APO Demand Planning module to provide SAP users with access to a superior forecasting engine and automatic forecasting capabilities.
- SAS® Model Manager
- Seamless integration of R models with the ability to register and manage R models in SAS Model Manager.
- Ability to perform champion/challenger side-by-side comparisons between SAS and R models to see which model performs best for a specific need.
- SAS/OR® and SAS® Simulation Studio
- Optimization
- Simulation
- Automatic input distribution fitting using JMP with SAS Simulation Studio.
Text analytics
- SAS® Text Miner
- SAS® Enterprise Content Categorization
- SAS® Sentiment Analysis
Scalability and high-performance
- SAS® Analytics Accelerator for Teradata (new product)
- SAS® Grid Manager
LICENCE:
• No parts of R are now licensed solely under GPL-2. The licences for packages rpart and survival have been changed, which means that the licence terms for R as distributed are GPL-2 | GPL-3.
This is a maintenance release to consolidate various minor fixes to 2.13.0.
CHANGES IN R VERSION 2.13.1:
NEW FEATURES:
• iconv() no longer translates NA strings as "NA".
• persp(box = TRUE) now warns if the surface extends outside the
box (since occlusion for the box and axes is computed assuming
the box is a bounding box). (PR#202.)
• RShowDoc() can now display the licences shipped with R, e.g.
RShowDoc("GPL-3").
• New wrapper function showNonASCIIfile() in package tools.
• nobs() now has a "mle" method in package stats4.
• trace() now deals correctly with S4 reference classes and
corresponding reference methods (e.g., $trace()) have been added.
• xz has been updated to 5.0.3 (very minor bugfix release).
• tools::compactPDF() gets more compression (usually a little,
sometimes a lot) by using the compressed object streams of PDF
1.5.
• cairo_ps(onefile = TRUE) generates encapsulated EPS on platforms
with cairo >= 1.6.
• Binary reads (e.g. by readChar() and readBin()) are now supported
on clipboard connections. (Wish of PR#14593.)
• as.POSIXlt.factor() now passes ... to the character method
(suggestion of Joshua Ulrich). [Intended for R 2.13.0 but
accidentally removed before release.]
• vector() and its wrappers such as integer() and double() now warn
if called with a length argument of more than one element. This
helps track down user errors such as calling double(x) instead of
as.double(x).
INSTALLATION:
• Building the vignette PDFs in packages grid and utils is now part
of running make from an SVN checkout on a Unix-alike: a separate
make vignettes step is no longer required.
These vignettes are now made with keep.source = TRUE and hence
will be laid out differently.
• make install-strip failed under some configuration options.
• Packages can customize non-standard installation of compiled code
via a src/install.libs.R script. This allows packages that have
architecture-specific binaries (beyond the package's shared
objects/DLLs) to be installed in a multi-architecture setting.
SWEAVE & VIGNETTES:
• Sweave() and Stangle() gain an encoding argument to specify the
encoding of the vignette sources if the latter do not contain a
\usepackage[]{inputenc} statement specifying a single input
encoding.
• There is a new Sweave option figs.only = TRUE to run each figure
chunk only for each selected graphics device, and not first using
the default graphics device. This will become the default in R
2.14.0.
• Sweave custom graphics devices can have a custom function
foo.off() to shut them down.
• Warnings are issued when non-portable filenames are found for
graphics files (and chunks if split = TRUE). Portable names are
regarded as alphanumeric plus hyphen, underscore, plus and hash
(periods cause problems with recognizing file extensions).
• The Rtangle() driver has a new option show.line.nos which is by
default false; if true it annotates code chunks with a comment
giving the line number of the first line in the sources (the
behaviour of R >= 2.12.0).
• Package installation tangles the vignette sources: this step now
converts the vignette sources from the vignette/package encoding
to the current encoding, and records the encoding (if not ASCII)
in a comment line at the top of the installed .R file.
DEPRECATED AND DEFUNCT:
• The internal functions .readRDS() and .saveRDS() are now
deprecated in favour of the public functions readRDS() and
saveRDS() introduced in R 2.13.0.
• Switching off lazy-loading of code _via_ the LazyLoad field of
the DESCRIPTION file is now deprecated. In future all packages
will be lazy-loaded.
• The off-line help() types "postscript" and "ps" are deprecated.
UTILITIES:
• R CMD check on a multi-architecture installation now skips the
user's .Renviron file for the architecture-specific tests (which
do read the architecture-specific Renviron.site files). This is
consistent with single-architecture checks, which use
--no-environ.
• R CMD build now looks for DESCRIPTION fields BuildResaveData and
BuildKeepEmpty for per-package overrides. See ‘Writing R
Extensions’.
BUG FIXES:
• plot.lm(which = 5) was intended to order factor levels in
increasing order of mean standardized residual. It ordered the
factor labels correctly, but could plot the wrong group of
residuals against the label. (PR#14545)
• mosaicplot() could clip the factor labels, and could overlap them
with the cells if a non-default value of cex.axis was used.
(Related to PR#14550.)
• dataframe[[row,col]] now dispatches on [[ methods for the
selected column (spotted by Bill Dunlap).
• sort.int() would strip the class of an object, but leave its
object bit set. (Reported by Bill Dunlap.)
• pbirthday() and qbirthday() did not implement the algorithm
exactly as given in their reference and so were unnecessarily
inaccurate.
pbirthday() now solves the approximate formula analytically
rather than using uniroot() on a discontinuous function.
The description of the problem was inaccurate: the probability is
a tail probablity (‘2 _or more_ people share a birthday’)
• Complex arithmetic sometimes warned incorrectly about producing
NAs when there were NaNs in the input.
• seek(origin = "current") incorrectly reported it was not
implemented for a gzfile() connection.
• c(), unlist(), cbind() and rbind() could silently overflow the
maximum vector length and cause a segfault. (PR#14571)
• The fonts argument to X11(type = "Xlib") was being ignored.
• Reading (e.g. with readBin()) from a raw connection was not
advancing the pointer, so successive reads would read the same
value. (Spotted by Bill Dunlap.)
• Parsed text containing embedded newlines was printed incorrectly
by as.character.srcref(). (Reported by Hadley Wickham.)
• decompose() used with a series of a non-integer number of periods
returned a seasonal component shorter than the original series.
(Reported by Rob Hyndman.)
• fields = list() failed for setRefClass(). (Reported by Michael
Lawrence.)
• Reference classes could not redefine an inherited field which had
class "ANY". (Reported by Janko Thyson.)
• Methods that override previously loaded versions will now be
installed and called. (Reported by Iago Mosqueira.)
• addmargins() called numeric(apos) rather than
numeric(length(apos)).
• The HTML help search sometimes produced bad links. (PR#14608)
• Command completion will no longer be broken if tail.default() is
redefined by the user. (Problem reported by Henrik Bengtsson.)
• LaTeX rendering of markup in titles of help pages has been
improved; in particular, \eqn{} may be used there.
• isClass() used its own namespace as the default of the where
argument inadvertently.
• Rd conversion to latex mis-handled multi-line titles (including
cases where there was a blank line in the \title section).
Analytics 2011 Conference
From http://www.sas.com/events/analytics/us/
The Analytics 2011 Conference Series combines the power of SAS’s M2010 Data Mining Conference and F2010 Business Forecasting Conference into one conference covering the latest trends and techniques in the field of analytics. Analytics 2011 Conference Series brings the brightest minds in the field of analytics together with hundreds of analytics practitioners. Join us as these leading conferences change names and locations. At Analytics 2011, you’ll learn through a series of case studies, technical presentations and hands-on training. If you are in the field of analytics, this is one conference you can’t afford to miss.
Conference Details
October 24-25, 2011
Grande Lakes Resort
Orlando, FL
Analytics 2011 topic areas include:
- Data Mining
- Forecasting
- Text Analytics
- Fraud Detection
- Data Visualization (more…)
Predictive Analytics World
Here is an announcement from Predictive Analytics World, the worlds largest vendor neutral conference dedicated to Predictive Analytics alone. Decisionstats has been a blog partner of PAWCON since inception. This is cool stuff!
|
||||||||||||||||||
Tom Davenport to Keynote at PAW New York
message from Predictive Analytics World. If you are NY based you may want to drop in and listen.———————————————————————————-Tom Davenport to Keynote at
Predictive Analytics World New York
Take advantage of Super Early Bird Pricing by May 20th and recognize savings of $400. Additional savings when you bring the team*
Join your peers October 17-21, 2011 at the Hilton New York for Predictive Analytics World, the business event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors. |
||||
| RAVE REVIEWS:“I came to PAW because it provides case studies relevant to my industry. It has lived up to the expectation and I think it’s the best analytics conference I’ve ever attended!“
Shaohua Zhang, Senior Data Mining Analyst
Rogers Telecommunications “Hands down, best applied analytics conference I have ever attended. Great exposure to cutting-edge predictive techniques and I was able to turn around and apply some of those learnings to my work immediately. I’ve never been able to say that after any conference I’ve attended before!“ Jon Francis, Senior Statistician
T-Mobile PAW NYC’s agenda covers black box trading, churn modeling, crowdsourcing, demand forecasting, ensemble models, fraud detection, healthcare, insurance applications, law enforcement, litigation, market mix modeling, mobile analytics, online marketing, risk management, social data, supply chain management, targeting direct marketing, uplift modeling (net lift), and other innovative applications that benefit organizations in new and creative ways. Take advantage of Super Early Bird Pricing and realize Note: Each additional attendee from the same company registered at the same time receives an extra $200 off the Conference Pass. |
||||
Related articles
- Predictive Analytics World Conference – New York City and London, UK (decisionstats.com)


Carole-Ann Matignon – Co-Founder, President & Chief Executive Officer









