#Rstats gets into Enterprise Cloud Software

Defense Agencies of the United States Departme...
Image via Wikipedia

Here is an excellent example of how websites should help rather than hinder new customers take a demo of the software without being overwhelmed by sweet talking marketing guys who dont know the difference between heteroskedasticity, probability, odds and likelihood.

It is made by Zementis (Dr Michael Zeller has been a frequent guest here) and Revolution Analytics is still the best shot in Enterprise software for #Rstats

Now if only Revo could get into the lucrative Department of Energy or Department of Defense business- they could change the world AND earn some more revenue than they have been doing. But seriously.

Check out http://deployr.revolutionanalytics.com/zementis/ and play with it. or better still mash it with some data viz and ROC curves.- or extend it with some APIS 😉

Tom Davenport to Keynote at PAW New York

Unidentified building, Babson College - IMG 0443
Image via Wikipedia

message from Predictive Analytics World. If you are NY based you may want to drop in and listen.———————————————————————————-Tom Davenport to Keynote at
Predictive Analytics World New York

Take advantage of Super Early Bird Pricing by May 20th and recognize savings of $400. Additional savings when you bring the team*

Announcing Tom Davenport Keynote:
Thomas Davenport Every Day Analytics:
Making Leading Edge Commonplace
Thomas Davenport
President’s Distinguished Prof, Babson College
Author, Competing on Analytics & Analytics at Work

Join your peers October 17-21, 2011 at the Hilton New York for Predictive Analytics World, the business event for predictive analytics professionals, managers and commercial practitioners, covering today’s commercial deployment of predictive analytics, across industries and across software vendors.

PAW NYC
 promises to once again break records as the biggest cross-vendor predictive analytics event ever. The conference program is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote addresses from Thomas Davenport, author of Competing on Analytics: The New Science of Winning, and PAW Program Chair Eric Siegel, plus special sessions from industry heavy-weights Usama Fayyad and John Elder.

RAVE REVIEWS:I came to PAW because it provides case studies relevant to my industry. It has lived up to the expectation and I think it’s the best analytics conference I’ve ever attended!

Shaohua Zhang, Senior Data Mining Analyst
Rogers Telecommunications

Hands down, best applied analytics conference I have ever attended. Great exposure to cutting-edge predictive techniques and I was able to turn around and apply some of those learnings to my work immediately. I’ve never been able to say that after any conference I’ve attended before!

Jon Francis, Senior Statistician
T-Mobile

PAW NYC’s agenda covers black box trading, churn modeling, crowdsourcing, demand forecasting, ensemble models, fraud detection, healthcare, insurance applications, law enforcement, litigation, market mix modeling, mobile analytics, online marketing, risk management, social data, supply chain management, targeting direct marketing, uplift modeling (net lift), and other innovative applications that benefit organizations in new and creative ways.


Take advantage of Super Early Bird Pricing and realize
$400 in savings before May 20, 2011.

Note:  Each additional attendee from the same company registered at the same time receives an extra $200 off the Conference Pass.

Register Now!


eMetrics New York

Browsing update- Dear Decisionstats.com Reader

Wordpress default1 mainpage
Image via Wikipedia

In view of the recent root level breach of WordPress, which may include viewing source code for hidden hacks or Trojans, as effective immediately, please Decisionstats.com has no responsibility for any viruses, or Trojans that you may inadvertently download while on this website. I will be responsible for any deliberate malicious honey traps I put up , but any body putting an interesting comment with a link on this website , can and may direct you to phishing.

All disputes will be to subject to the jurisdiction of Tis Hazari Court, Delhi, India as already mentioned.

New book on BigData Analytics and Data mining using #Rstats with a GUI

Joseph Marie Jacquard
Image via Wikipedia

I am hoping to put this on my pre-ordered or Amazon Wish list. The book the common people who wanted to do data mining with , but were unable to ask aloud they didnt know much.  It is written by the seminal Australian authority on data mining Dr Graham Williams whom I interviewed here at https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

Data Mining for the masses using an ergonomically designed Graphical User Interface.

Thank you Springer. Thank you Dr Graham Williams

http://www.springer.com/statistics/physical+%26+information+science/book/978-1-4419-9889-7

Data Mining with Rattle and R

Data Mining with Rattle and R

The Art of Excavating Data for Knowledge Discovery

Series: Use R

Williams, Graham

1st Edition., 2011, XX, 409 p. 150 illus. in color.

  • Softcover, ISBN 978-1-4419-9889-7

    Due: August 29, 2011

    54,95 €
  • Encourages the concept of programming with data – more than just pushing data through tools, but learning to live and breathe the data
  • Accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics
  • Details some of the more popular algorithms for data mining, as well as covering model evaluation and model deployment

Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.

Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.

The book covers data understanding, data preparation, data refinement, model building, model evaluation,  and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.

Content Level » Research

Keywords » Data mining

Related subjects » Physical & Information Science

Related- https://decisionstats.com/2009/01/13/interview-dr-graham-williams/

High Performance Analytics

Marry Big Data Analytics to High Performance Computing, and you get the buzzword of this season- High Performance Analytics.

It basically consists of Parallelized code to run in parallel on custom hardware, in -database analytics for speed, and cloud computing /high performance computing environments. On an operational level, it consists of software (as in analytics) partnering with software (as in databases, Map reduce, Hadoop) plus some hardware (HP or IBM mostly). It is considered a high margin , highly profitable, business with small number of deals compared to say desktop licenses.

As per HPC Wire- which is a great tool/newsletter to keep updated on HPC , SAS Institute has been busy on this front partnering with EMC Greenplum and TeraData (who also acquired  SAS Partner AsterData to gain a much needed foot in the MR/SQL space) Continue reading “High Performance Analytics”

Predictive Analytics World Conference –New York City and London, UK

Please use the following code  to get a 15% discount on the 2 Day Conference Pass:  AJAYNY11.

Predictive Analytics World Conference –New York City and London, UK

October 17-21, 2011 – New York City, NY (pawcon.com/nyc)
Nov 30 – Dec 1, 2011 – London, UK (pawcon.com/london)

Predictive Analytics World (pawcon.com) is the business-focused event for predictive analytics
professionals, managers and commercial practitioners, covering today’s commercial deployment of
predictive analytics, across industries and across software vendors. The conference delivers case
studies, expertise, and resources to achieve two objectives:

1) Bigger wins: Strengthen the business impact delivered by predictive analytics

2) Broader capabilities: Establish new opportunities with predictive analytics

Case Studies: How the Leading Enterprises Do It

Predictive Analytics World focuses on concrete examples of deployed predictive analytics. The leading
enterprises have signed up to tell their stories, so you can hear from the horse’s mouth precisely how
Fortune 500 analytics competitors and other top practitioners deploy predictive modeling, and what
kind of business impact it delivers.

PAW NEW YORK CITY 2011

PAW’s NYC program is the richest and most diverse yet, featuring over 40 sessions across three tracks
– including both X and Y tracks, and an “Expert/Practitioner” track — so you can witness how predictive
analytics is applied at major companies.

PAW NYC’s agenda covers hot topics and advanced methods such as ensemble models, social data,
search marketing, crowdsourcing, blackbox trading, fraud detection, risk management, survey analysis,
and other innovative applications that benefit organizations in new and creative ways.

WORKSHOPS: PAW NYC also features five full-day pre- and post-conference workshops that
complement the core conference program. Workshop agendas include advanced predictive modeling
methods, hands-on training, an intro to R (the open source analytics system), and enterprise decision
management.

For more see http://www.predictiveanalyticsworld.com/newyork/2011/

PAW LONDON 2011

PAW London’s agenda covers hot topics and advanced methods such as risk management, uplift
(incremental lift) modeling, open source analytics, and crowdsourcing data mining. Case study
presentations cover campaign targeting, churn modeling, next-best-offer, selecting marketing channels,
global analytics deployment, email marketing, HR candidate search, and other innovative applications
that benefit organizations in new and creative ways.

Join PAW and access the best keynotes, sessions, workshops, exposition, expert panel, live demos,
networking coffee breaks, reception, birds-of-a-feather lunches, brand-name enterprise leaders, and

industry heavyweights in the business.

For more see http://www.predictiveanalyticsworld.com/london

CROSS-INDUSTRY APPLICATIONS

Predictive Analytics World is the only conference of its kind, delivering vendor-neutral sessions across
verticals such as banking, financial services, e-commerce, education, government, healthcare, high
technology, insurance, non-profits, publishing, social gaming, retail and telecommunications

And PAW covers the gamut of commercial applications of predictive analytics, including response
modeling, customer retention with churn modeling, product recommendations, fraud detection, online
marketing optimization, human resource decision-making, law enforcement, sales forecasting, and
credit scoring.

Why bring together such a wide range of endeavors? No matter how you use predictive analytics, the
story is the same: Predicatively scoring customers optimizes business performance. Predictive analytics
initiatives across industries leverage the same core predictive modeling technology, share similar project
overhead and data requirements, and face common process challenges and analytical hurdles.

RAVE REVIEWS:

“Hands down, best applied, analytics conference I have ever attended. Great exposure to cutting-edge
predictive techniques and I was able to turn around and apply some of those learnings to my work
immediately. I’ve never been able to say that after any conference I’ve attended before!”

Jon Francis
Senior Statistician
T-Mobile

Read more: Articles and blog entries about PAW can be found at http://www.predictiveanalyticsworld.com/
pressroom.php

VENDORS. Meet the vendors and learn about their solutions, software and service. Discover the best
predictive analytics vendors available to serve your needs – learn what they do and see how they
compare

COLLEAGUES. Mingle, network and hang out with your best and brightest colleagues. Exchange
experiences over lunch, coffee breaks and the conference reception connecting with those professionals
who face the same challenges as you.

GET STARTED. If you’re new to predictive analytics, kicking off a new initiative, or exploring new ways
to position it at your organization, there’s no better place to get your bearings than Predictive Analytics
World. See what other companies are doing, witness vendor demos, participate in discussions with the
experts, network with your colleagues and weigh your options!

For more information:
http://www.predictiveanalyticsworld.com

View videos of PAW Washington DC, Oct 2010 — now available on-demand:
http://www.predictiveanalyticsworld.com/online-video.php

What is predictive analytics? See the Predictive Analytics Guide:
http://www.predictiveanalyticsworld.com/predictive_analytics.php

If you’d like our informative event updates, sign up at:
http://www.predictiveanalyticsworld.com/signup-us.php

To sign up for the PAW group on LinkedIn, see:
http://www.linkedin.com/e/gis/1005097

For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495.

Augustus- a PMML model producer and consumer. Scoring engine.

A Bold GNU Head
Image via Wikipedia

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Recent News

  • Augustus v 0.4.3.1 has been released
  • Added a guide (pdf) for including Augustus in the Windows System Properties.
  • Updated the install documentation.
  • Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
  • Added performance discussion concerning the optional cyclic garbage collection.

See Recent News for more details and all recent news.

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.


Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) ModelsTree ModelsRegression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

  1. Identify suitable data with which to construct a new model.
  2. Provide a model schema which proscribes the requirements for the model.
  3. Run the Augustus producer to obtain a new model.
  4. Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

  • Re-normalize the scoring results or performing an additional transformation.
  • Supplements the results with global meta-data such as timestamps.
  • Formatting of the results.
  • Select certain interesting values from the results.
  • Restructure the data for use with other applications.