Using Views in R and comparing functions across multiple packages

Some RDF hacking relating to updating probabil... — Image via Wikipedia

R has almost 2923 available packages

This makes the task of searching among these packages and comparing functions for the same analytical task across different packages a bit tedious and prone to manual searching (of reading multiple Pdfs of help /vignette of packages) or sending an email to the R help list.

However using R Views is a slightly better way of managing all your analytical requirements for software rather than the large number of packages (see Graphics view below).

CRAN Task Views allow you to browse packages by topic and provide tools to automatically install all packages for special areas of interest. Currently, 28 views are available. http://cran.r-project.org/web/views/

Bayesian Bayesian Inference

ChemPhys Chemometrics and Computational Physics

ClinicalTrials Clinical Trial Design, Monitoring, and Analysis

Cluster Cluster Analysis & Finite Mixture Models

Distributions Probability Distributions

Econometrics Computational Econometrics

Environmetrics Analysis of Ecological and Environmental Data

ExperimentalDesign Design of Experiments (DoE) & Analysis of Experimental Data

Finance Empirical Finance

Genetics Statistical Genetics

Graphics Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

gR gRaphical Models in R

HighPerformanceComputing High-Performance and Parallel Computing with R

MachineLearning Machine Learning & Statistical Learning

MedicalImaging Medical Image Analysis

Multivariate Multivariate Statistics

NaturalLanguageProcessing Natural Language Processing

OfficialStatistics Official Statistics & Survey Methodology

Optimization Optimization and Mathematical Programming

Pharmacokinetics Analysis of Pharmacokinetic Data

Phylogenetics Phylogenetics, Especially Comparative Methods

Psychometrics Psychometric Models and Methods

ReproducibleResearch Reproducible Research

Robust Robust Statistical Methods

SocialSciences Statistics for the Social Sciences

Spatial Analysis of Spatial Data

Survival Survival Analysis

TimeSeries Time Series Analysis

To automatically install these views, the ctv package needs to be installed, e.g., via
install.packages("ctv")
library("ctv")
Created by Pretty R at inside-R.org
and then the views can be installed via install.views or update.views (which first assesses which of the packages are already installed and up-to-date), e.g.,
install.views("Econometrics")
 update.views("Econometrics")
 Created by Pretty R at inside-R.org

and

http://cran.r-project.org/web/views/Graphics.html

CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

Maintainer:	Nicholas Lewin-Koh
Contact:	nikko at hailmail.net
Version:	2009-10-28

R is rich with facilities for creating and developing interesting graphics. Base R contains functionality for many plot types including coplots, mosaic plots, biplots, and the list goes on. There are devices such as postscript, png, jpeg and pdf for outputting graphics as well as device drivers for all platforms running R. lattice and grid are supplied with R’s recommended packages and are included in every binary distribution. lattice is an R implementation of William Cleveland’s trellis graphics, while grid defines a much more flexible graphics environment than the base R graphics.

R’s base graphics are implemented in the same way as in the S3 system developed by Becker, Chambers, and Wilks. There is a static device, which is treated as a static canvas and objects are drawn on the device through R plotting commands. The device has a set of global parameters such as margins and layouts which can be manipulated by the user using par() commands. The R graphics engine does not maintain a user visible graphics list, and there is no system of double buffering, so objects cannot be easily edited without redrawing a whole plot. This situation may change in R 2.7.x, where developers are working on double buffering for R devices. Even so, the base R graphics can produce many plots with extremely fine graphics in many specialized instances.

One can quickly run into trouble with R’s base graphic system if one wants to design complex layouts where scaling is maintained properly on resizing, nested graphs are desired or more interactivity is needed. grid was designed by Paul Murrell to overcome some of these limitations and as a result packages like lattice, ggplot2, vcd or hexbin (on Bioconductor ) use grid for the underlying primitives. When using plots designed with grid one needs to keep in mind that grid is based on a system of viewports and graphic objects. To add objects one needs to use grid commands, e.g., grid.polygon() rather than polygon(). Also grid maintains a stack of viewports from the device and one needs to make sure the desired viewport is at the top of the stack. There is a great deal of explanatory documentation included with grid as vignettes.

The graphics packages in R can be organized roughly into the following topics, which range from the more user oriented at the top to the more developer oriented at the bottom. The categories are not mutually exclusive but are for the convenience of presentation:

Plotting : Enhancements for specialized plots can be found in plotrix, for polar plotting, vcd for categorical data, hexbin (on Bioconductor ) for hexagon binning, gclus for ordering plots and gplots for some plotting enhancements. Some specialized graphs, like Chernoff faces are implemented in aplpack, which also has a nice implementation of Tukey’s bag plot. For 3D plots lattice, scatterplot3d and misc3d provide a selection of plots for different kinds of 3D plotting. scatterplot3d is based on R’s base graphics system, while misc3d is based on rgl. The package onion for visualizing quaternions and octonions is well suited to display 3D graphics based on derived meshes.
Graphic Applications : This area is not much different from the plotting section except that these packages have tools that may not for display, but can aid in creating effective displays. Also included are packages with more esoteric plotting methods. For specific subject areas, like maps, or clustering the excellent task views contributed by other dedicated useRs is an excellent place to start.
- Effect ordering : The gclus package focuses on the ordering of graphs to accentuate cluster structure or natural ordering in the data. While not for graphics directly cba and seriation have functions for creating 1 dimensional orderings from higher dimensional criteria. For ordering an array of displays, biclust can be useful.
- Large Data Sets : Large data sets can present very different challenges from moderate and small datasets. Aside from overplotting, rendering 1,000,000 points can tax even modern GPU’s. For univariate datalvplot produces letter value boxplots which alleviate some of the problems that standard boxplots exhibit for large data sets. For bivariate data ash can produce a bivariate smoothed histogram very quickly, and hexbin, on Bioconductor , can bin bivariate data onto a hexagonal lattice, the advantage being that the irregular lines and orientation of hexagons do not create linear artifacts. For multivariate data, hexbin can be used to create a scatterplot matrix, combined with lattice. An alternative is to use scagnostics to produce a scaterplot matrix of “data about the data”, and look for interesting combinations of variables.
- Trees and Graphs : ape and ade4 have functions for plotting phylogenetic trees, which can be used for plotting dendrograms from clustering procedures. While these packages produce decent graphics, they do not use sophisticated algorithms for node placement, so may not be useful for very large trees. igraph has the Tilford-Rheingold algorithm implementead and is useful for plotting larger trees. diagram as facilities for flow diagrams and simple graphs. For more sophisticated graphs Rgraphviz and igraph have functions for plotting and layout, especially useful for representing large networks.
Graphics Systems : lattice is built on top of the grid graphics system and is an R implementation of William Cleveland’s trellis system for S-PLUS. lattice allows for building many types of plots with sophisticated layouts based on conditioning. ggplot2 is an R implementation of the system described in “A Grammar of Graphics” by Leland Wilkinson. Like lattice, ggplot (also built on top of grid) assists in trellis-like graphics, but allows for much more. Since it is built on the idea of a semantics for graphics there is much more emphasis on reshaping data, transformation, and assembling the elements of a plot.
Devices : Whereas grid is built on top of the R graphics engine, many in the R community have found the R graphics engine somewhat inflexible and have written separate device drivers that either emphasize interactivity or plotting in various graphics formats. R base supplies devices for PostScript, PDF, JPEG and other formats. Devices on CRAN include cairoDevice which is a device based libcairo, which can actually render to many device types. The cairo device is desgned to work with RGTK2, which is an interface to the Gimp Tool Kit, similar to pyGTK2. GDD provides device drivers for several bitmap formats, including GIF and BMP. RSvgDevice is an SVG device driver and interfaces well with with vector drawing programs, or R web development packages, such as Rpad. When SVG devices are for web display developers should be aware that internet explorer does not support SVG, but has their own standard. Trust Microsoft. rgl provides a device driver based on OpenGL, and is good for 3D and interactive development. Lastly, the Augsburg group supplies a set of packages that includes a Java-based device, JavaGD.
Colors : The package colorspace provides a set of functions for transforming between color spaces and mixcolor() for mixing colors within a color space. Based on the HCL colors provided in colorspace, vcdprovides a set of functions for choosing color palettes suitable for coding categorical variables ( rainbow_hcl()) and numerical information ( sequential_hcl(), diverge_hcl()). Similar types of palettes are provided in RColorBrewer and dichromat is focused on palettes for color-impaired viewers.
Interactive Graphics : There are several efforts to implement interactive graphics systems that interface well with R. In an interactive system the user can interactively query the graphics on the screen with the mouse, or a moveable brush to zoom, pan and query on the device as well as link with other views of the data. rggobi embeds the GGobi interactive graphics system within R, so that one can display a data frame or several in GGobi directly from R. The package has functions to support longitudinal data, and graphs using GGobi’s edge set functionality. The RoSuDA repository maintained and developed by the University of Augsburg group has two packages, iplots and iwidgets as well as their Java development environment including a Java device, JavaGD. Their interactive graphics tools contain functions for alpha blending, which produces darker shading around areas with more data. This is exceptionally useful for parallel coordinate plots where many lines can quickly obscure patterns. playwith has facilities for building interactive versions of R graphics using the cairoDevice and RGtk2. Lastly, the rgl package has mechanisms for interactive manipulation of plots, especially 3D rotations and surfaces.
Development : For development of specialized graphics packages in R, grid should probably be the first consideration for any new plot type. rgl has better tools for 3D graphics, since the device is interactive, though it can be slow. An alternative is to use Java and the Java device in the RoSuDA packages, though Java has its own drawbacks. For porting plotting code to grid, using the package gridBase presents a nice intermediate step to embed base graphics in grid graphics and vice versa.

CRAN Task View: Machine Learning & Statistical Learning (cran.r-project.org)
The R-Files: Dirk Eddlebuettel (revolutionanalytics.com)
R Commander Plugins-20 and growing! (decisionstats.com)
R Node- and other Web Interfaces to R (decisionstats.com)
Packages for By-Group Processing in R (revolutionanalytics.com)
R ready to Deduce you (ekonometrics.blogspot.com)

Google Snappy

Diagram of how a 32-bit integer is arranged in... — Image via Wikipedia

a cool sounding software- yet again by the guys from California, this one enables to zip and unzip Big Data much much faster

http://news.ycombinator.com/item?id=2356735

and

https://code.google.com/p/snappy/

Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.

Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as “Zippy” in some presentations and the likes.)

For more information, please see the README. Benchmarks against a few other compression libraries (zlib, LZO, LZF, FastLZ, and QuickLZ) are included in the source code distribution.

Introduction

============

Snappy is a compression/decompression library. It does not aim for maximum

compression, or compatibility with any other compression library; instead,

it aims for very high speeds and reasonable compression. For instance,

compared to the fastest mode of zlib, Snappy is an order of magnitude faster

for most inputs, but the resulting compressed files are anywhere from 20% to

100% bigger. (For more information, see “Performance”, below.)

Snappy has the following properties:

* Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.

See “Performance” below.

* Stable: Over the last few years, Snappy has compressed and decompressed

petabytes of data in Google’s production environment. The Snappy bitstream

format is stable and will not change between versions.

* Robust: The Snappy decompressor is designed not to crash in the face of

corrupted or malicious input.

* Free and open source software: Snappy is licensed under the Apache license,

version 2.0. For more information, see the included COPYING file.

Snappy has previously been called “Zippy” in some Google presentations

and the like.

Performance

===========

Snappy is intended to be fast. On a single core of a Core i7 processor

in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at

about 500 MB/sec or more. (These numbers are for the slowest inputs in our

benchmark suite; others are much faster.) In our tests, Snappy usually

is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,

etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x

for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and

other already-compressed data. Similar numbers for zlib in its fastest mode

are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are

capable of achieving yet higher compression rates, although usually at the

expense of speed. Of course, compression ratio will vary significantly with

the input.

Although Snappy should be fairly portable, it is primarily optimized

for 64-bit x86-compatible processors, and may run slower in other environments.

In particular:

– Snappy uses 64-bit operations in several places to process more data at

once than would otherwise be possible.

– Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.

On some platforms, these must be emulated with single-byte loads

and stores, which is much slower.

– Snappy assumes little-endian throughout, and needs to byte-swap data in

several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.

Performance optimizations, whether for 64-bit x86 or other platforms,

are of course most welcome; see “Contact”, below.

Usage

=====

Note that Snappy, both the implementation and the interface,

is written in C++.

To use Snappy from your own program, include the file “snappy.h” from

your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

snappy::Compress(input, &output);

and similarly

snappy::Uncompress(input, &output);

where “input” and “output” are both instances of std::string.

Google releases snappy, the compression library used in Bigtable (code.google.com)
Maximizing Search Engine Visitors The Correct Way (ronmedlin.com)
MapReduce from the basics to the actually useful (in under 30 minutes) (cloudant.com)

Heritage Health Prize- Data Mining Contest for 3mill USD

An animation of the quicksort algorithm sortin... — Image via Wikipedia

If Netflix was about 1 mill USD to better online video choices, here is a chance to earn serious money, write great code, and save lives!

From http://www.heritagehealthprize.com/

Heritage Health Prize
Launching April 4

Laptop

More than 71 Million individuals in the United States are admitted to
hospitals each year, according to the latest survey from the American
Hospital Association. Studies have concluded that in 2006 well over
$30 billion was spent on unnecessary hospital admissions. Each of
these unnecessary admissions took away one hospital bed from someone
else who needed it more.

http://www.heritagehealthprize.com/competition.php

Prize Goal & Participation

The goal of the prize is to develop a predictive algorithm that can identify patients who will be admitted to the hospital within the next year, using historical claims data.

Official registration will open in 2011, after the launch of the prize. At that time, pre-registered teams will be notified to officially register for the competition. Teams must consent to be bound by final competition rules.

Registered teams will develop and test their algorithms. The winning algorithm will be able to predict patients at risk for an unplanned hospital admission with a high rate of accuracy. The first team to reach the accuracy threshold will have their algorithms confirmed by a judging panel. If confirmed, a winner will be declared.

The competition is expected to run for approximately two years. Registration will be open throughout the competition.

Data Sets

Registered teams will be granted access to two separate datasets of de-identified patient claims data for developing and testing algorithms: a training dataset and a quiz/test dataset. The datasets will be comprised of de-identified patient data. The datasets will include:

Outpatient encounter data
Hospitalization encounter data
Medication dispensing claims data, including medications
Outpatient laboratory data, including test outcome values

The data for each de-identified patient will be organized into two sections: “Historical Data” and “Admission Data.” Historical Data will represent three years of past claims data. This section of the dataset will be used to predict if that patient is going to be admitted during the Admission Data period. Admission Data represents previous claims data and will contain whether or not a hospital admission occurred for that patient; it will be a binary flag.

Data The training dataset includes several thousand anonymized patients and will be made available, securely and in full, to any registered team for the purpose of developing effective screening algorithms.

The quiz/test dataset is a smaller set of anonymized patients. Teams will only receive the Historical Data section of these datasets and the two datasets will be mixed together so that teams will not be aware of which de-identified patients are in which set. Teams will make predictions based on these data sets and submit their predictions to HPN through the official Heritage Health Prize web site. HPN will use the Quiz Dataset for the initial assessment of the Team’s algorithms. HPN will evaluate and report back scores to the teams through the prize website’s leader board.

Scores from the final Test Dataset will not be made available to teams until the accuracy thresholds are passed. The test dataset will be used in the final judging and results will be kept hidden. These scores are used to preserve the integrity of scoring and to help validate the predictive algorithms.

Teams can begin developing and testing their algorithms as soon as they are registered and ready. Teams will log onto the official Heritage Health Prize website and submit their predictions online. Comparisons will be run automatically and team accuracy scores will be posted on the leader board. This score will be only on a portion of the predictions submitted (the Quiz Dataset), the additional results will be kept back (the Test Dataset).

Form

Once a team successfully scores above the accuracy thresholds on the online testing (quiz dataset), final judging will occur. There will be three parts to this judging. First, the judges will confirm that the potential winning team’s algorithm accurately predicts patient admissions in the Test Dataset (again, above the thresholds for accuracy).

Next, the judging panel will confirm that the algorithm does not identify patients and use external data sources to derive its predictions. Lastly, the panel will confirm that the team’s algorithm is authentic and derives its predictive power from the datasets, not from hand-coding results to improve scores. If the algorithm meets these three criteria, it will be declared the winner.

Failure to meet any one of these three parts will disqualify the team and the contest will continue. The judges reserve the right to award second and third place prizes if deemed applicable.

HPN Health Prize: The X-Prize of Health Care (medicineandtechnology.com)
$3 million machine learning prize (heritagehealthprize.com)
Heritage Providers Continues to Promote $3 Million Dollar Prize to Create An Algorithm To Predict and Prevents Hospitalizations (ducknetweb.blogspot.com)
Netflix Prize-Style Competition Predicts Hospitalizations (fastcompany.com)
For Data Crunchers, A Glittering Prize (online.wsj.com)
The American Hospital Association Awards Its Exclusive Endorsement to HR Solutions’ Physician Engagement Survey (prweb.com)

HIGHLIGHTS from REXER Survey :R gives best satisfaction

A Summary report from Rexer Analytics Annual Survey

HIGHLIGHTS from the 4^th Annual Data Miner Survey (2010):

• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 field in each of the past four years. Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them.
A third of data miners currently use text mining and another third plan to in the future.

• MODELS: About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

• TOOLS: After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other. STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

• TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally. Model scoring typically happens using the same software used to develop models. STATISTICA users are more likely than other tool users to deploy models using PMML.

• CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face. This year data miners also shared best practices for overcoming these challenges. The best practices are available online.

• FUTURE: Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified. There is room to improve: only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.

Please contact us if you have any questions about the attached report or this annual research program. The 5^th Annual Data Miner Survey will be launching next month. We will email you an invitation to participate.

Information about Rexer Analytics is available at www.RexerAnalytics.com. Rexer Analytics continues their impressive journey see http://www.rexeranalytics.com/Clients.html

|My only thought- since most data miners are using multiple tools including free tools as well as paid software, Perhaps a pie chart of market share by revenue and volume would be handy.

Also some ideas on comparing diverse data mining projects by data size, or complexity.

Skills of a good data miner (zyxo.wordpress.com)
7 Data Blogs To Explore (readwriteweb.com)
FBI Data-Mining Program:Total Information Awareness (alitarhini.wordpress.com)

Is Random Poetry Click Fraud

Is poetry when randomized

Tweaked, meta tagged , search engine optimized

Violative of unseen terms and conditional clauses

Is random poetry or aggregated prose farmed for click fraud uses

I dont know, you tell me, says the blog boy,

Tapping away at the keyboard like a shiny new toy,

Geeks unfortunately too often are men too many,

Forgive the generalization, but the tech world is yet to be equalized.

If a New York Hot Dog is a slice of heaven at four bucks a piece

Then why is prose and poetry at five bucks an hour considered waste

Ah I see, you have grown old and cynical,

Of the numerous stupid internet capers and cyber ways

The clicking finger clicks on

swiftly but mostly delightfully virally moves on

While people collect its trails and

ponder its aggregated merry ways

All people are equal but all links are not,

Thus overturning two centuries of psychology had you been better taught,

But you chose to drop out of school, and create that search engine so big

It is now a fraud catchers head ache that millions try to search engine optimize and rig

Once again, people are different, in so many ways so prettier

Links are the same hyper linked code number five or earlier

People think like artificial artificial (thus natural) neural nets

Biochemically enhanced Harmonically possessed.

rather than analyze forensically and quite creepily

where people have been

Gentic Algorithms need some chaos

To see what till now hasnt been seen.

Again this was a random poem,

inspired by a random link that someone clicked

To get here, on a carbon burning cyber machine,

Having digested poem, moves on, unheard , unseen.

(Inspired by the Hyper Link at http://goo.gl/a8ijW )

Also-

Getting to the Bottom of Facebook Click Fraud (actionableinsights.covario.com)
Click Fraud: A Legal Look (firstrate.co.nz)
Microsoft says Google used click fraud to orchestrate Bing Sting (gabriellahiresit.com)
Vile poetry virtuous poetry for which my mind aches and soul groans lol (ask.metafilter.com)
Click fraud on the rise again? (silvertailsystems.wordpress.com)
Study: Click Fraud Drops Slightly in Q4 (pcworld.com)
Click Fraud: What Is Your Ad Network Doing About It? (searchenginewatch.com)

SAS to R Challenge: Unique benchmarking

Flag of Town of Cary — Image via Wikipedia

An interesting announcemnet from Revolution Analytics promises to convert your legacy code in SAS language not only cheaper but faster. It’ s a very very interesting challenge and I wonder how SAS users ,corporates, customers as well as the Institute itself reacts

http://www.revolutionanalytics.com/sas-challenge/

Are you paying for expensive software licenses and hardware to run time-consuming statistical analyses on big data sets?

If you’re doing linear regressions, logistic regressions, predictions, or multivariate crosstabulations* there’s something you should know: Revolution Analytics can get the same results for a substantially lower cost and faster than SAS®.

Quick Link:
Revolution R Enterprise 4.2
Top 10 Reasons to Buy

For a limited time only, Revolution Analytics invites you take the SAS to R Challenge. Let us prove that we can deliver on our promise of replicating your results in R, faster and cheaper than SAS.

Here’s how it works:

Fill out the short form below, and one of our conversion experts will contact you to discuss the SAS code you want to convert. If we think Revolution R Enterprise can get the same results faster than SAS, we’ll convert your code to R free of charge. Our goal is to demonstrate that Revolution R Enterprise will produce the same results in less time. There’s no obligation, but if you choose to convert, we guarantee that your license cost for Revolution R Enterprise will be less than half what you’re currently paying for the equivalent SAS software.**

It’s that simple.

We’ll show you that you don’t need expensive hardware and software to do high quality statistical analysis of big data. And we’ll show that you don’t need to tie up your computing resources with long running operations. With Revolution R Enterprise, you can run analyses on commodity hardware using Linux or Windows, scale to terabyte-class data problems and do it at processing speeds you would never have thought possible.

Sign up now, and we will be in touch shortly.

—————————-

SAS is a registered trademark of the SAS Institute, Cary, NC, in the US and other countries.

*Additional statistical algorithms are being rapidly added to Revolution R Enterprise. Custom development services are also available.

**Revolution Analytics retains the right to determine eligibility for this offer. Offer available until March 31, 2011.

Revolution R Enterprise 4.2 now available (revolutionanalytics.com)
Live from Strata (revolutionanalytics.com)
Revolution Analytics in 2010 (revolutionanalytics.com)
UBIT: SAS for Windows (ubit.buffalo.edu)
What’s Next for Revolution R and Hadoop? (revolutionanalytics.com)
A simple test to predict coronary artery disease (r-bloggers.com)

Carole-Ann’s 2011 Predictions for Decision Management

Carole-Ann’s 2011 Predictions for Decision Management

For Ajay Ohri on DecisionStats.com

What were the top 5 events in 2010 in your field?

Maturity: the Decision Management space was made up of technology vendors, big and small, that typically focused on one or two aspects of this discipline. Over the past few years, we have seen a lot of consolidation in the industry – first with Business Intelligence (BI) then Business Process Management (BPM) and lately in Business Rules Management (BRM) and Advanced Analytics. As a result the giant Platform vendors have helped create visibility for this discipline. Lots of tiny clues finally bubbled up in 2010 to attest of the increasing activity around Decision Management. For example, more products than ever were named Decision Manager; companies advertised for Decision Managers as a job title in their job section; most people understand what I do when I am introduced in a social setting!
Boredom: unfortunately, as the industry matures, inevitably innovation slows down… At the main BRMS shows we heard here and there complaints that the technology was stalling. We heard it from vendors like Red Hat (Drools) and we heard it from bored end-users hoping for some excitement at Business Rules Forum’s vendor panel. They sadly did not get it
Scrum: I am not thinking about the methodology there! If you have ever seen a rugby game, you can probably understand why this is the term that comes to mind when I look at the messy & confusing technology landscape. Feet blindly try to kick the ball out while superhuman forces are moving randomly the whole pack – or so it felt when I played! Business Users in search of Business Solutions are facing more and more technology choices that feel like comparing apples to oranges. There is value in all of them and each one addresses a specific aspect of Decision Management but I regret that the industry did not simplify the picture in 2010. On the contrary! Many buzzwords were created or at least made popular last year, creating even more confusion on a muddy field. A few examples: Social CRM, Collaborative Decision Making, Adaptive Case Management, etc. Don’t take me wrong, I *do* like the technologies. I sympathize with the decision maker that is trying to pick the right solution though.
Information: Analytics have been used for years of course but the volume of data surrounding us has been growing to unparalleled levels. We can blame or thank (depending on our perspective) Social Media for that. Sites like Facebook and LinkedIn have made it possible and easy to publish relevant (as well as fluffy) information in real-time. As we all started to get the hang of it and potentially over-publish, technology evolved to enable the storage, correlation and analysis of humongous volumes of data that we could not dream of before. 25 billion tweets were posted in 2010. Every month, over 30 billion pieces of data are shared on Facebook alone. This is not just about vanity and marketing though. This data can be leveraged for the greater good. Carlos pointed to some fascinating facts about catastrophic event response team getting organized thanks to crowd-sourced information. We are also seeing, in the Decision management world, more and more applicability for those very technology that have been developed for the needs of Big Data – I’ll name for example Hadoop that Carlos (yet again) discussed in his talks at Rules Fest end of 2009 and 2010.
Self-Organization: it may be a side effect of the Social Media movement but I must admit that I was impressed by the success of self-organizing initiatives. Granted, this last trend has nothing to do with Decision Management per se but I think it is a great evolution worth noting. Let me point to a couple of examples. I usually attend traditional conferences and tradeshows in which the content can be good but is sometimes terrible. I was pleasantly surprised by the professionalism and attendance at *un-conferences* such as P-Camp (P stands for Product – an event for Product Managers). When you think about it, it is already difficult to get a show together when people are dedicated to the tasks. How crazy is it to have volunteers set one up with no budget and no agenda? Well, people simply show up to do their part and everyone has fun voting on-site for what seems the most appealing content at the time. Crowdsourcing applied to shows: it works! Similar experience with meetups or tweetups. I also enjoyed attending some impromptu Twitter jam sessions on a given topic. Social Media is certainly helping people reach out and get together in person or virtually and that is wonderful!

Image via Wikipedia

What are the top three trends you see in 2011?

Performance: I might be cheating here. I was very bullish about predicting much progress for 2010 in the area of Performance Management in your Decision Management initiatives. I believe that progress was made but Carlos did not give me full credit for the right prediction… Okay, I am a little optimistic on timeline… I admit it… If it did not fully happen in 2010, can I predict it again in 2011? I think that companies want to better track their business performance in order to correct the trajectory of course but also to improve their projections. I see that it is turning into reality already here and there. I expect it to become a trend in 2011!
Insight: Big Data being available all around us with new technologies and algorithms will continue to propagate in 2011 leading to more widely spread Analytics capabilities. The buzz at Analytics shows on Social Network Analysis (SNA) is a sign that there is interest in those kinds of things. There is tremendous information that can be leveraged for smart decision-making. I think there will be more of that in 2011 as initiatives launches in 2010 will mature into material results.

Image by Intersection Consulting via Flickr
Collaboration: Social Media for the Enterprise is a discipline in the making. Social Media was initially seen for the most part as a Marketing channel. Over the years, companies have started experimenting with external communities and ideation capabilities with moderate success. The few strategic initiatives started in 2010 by “old fashion” companies seem to be an indication that we are past the early adopters. This discipline may very well materialize in 2011 as a core capability, well, or at least a new trend. I believe that capabilities such Chatter, offered by Salesforce, will transform (slowly) how people interact in the workplace and leverage the volumes of social data captured in LinkedIn and other Social Media sites. Collaboration is of course a topic of interest for me personally. I even signed up for Kare Anderson’s collaboration collaboration site – yes, twice the word “collaboration”: it is really about collaborating on collaboration techniques. Even though collaboration does not require Social Media, this medium offers perspectives not available until now.

Brief Bio-

Carole-Ann is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry. Her claim to fame is the strategy and direction of Blaze Advisor, the then-leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience. She speaks often at Industry conferences and has conducted university classes in France and Washington DC.

Leveraging her Masters degree in Applied Mathematics / Computer Science from a “Grande Ecole” in France, she started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication – as well as conducting strategic consulting gigs around change management.

She now tweets as @CMatignon, blogs at blog.sparklinglogic.com and interacts at community.sparklinglogic.com.

She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication. At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs mostly around change management.

While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM). She developed a growing interest in Optimization as well as Business Rules. At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart. She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).

Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business. Her technical background kept her very much in touch with technology as she advanced.

She also became addicted to Twitter in the process. She is active on all kinds of social media, always looking for new digital experience!

Outside of work, Carole-Ann loves spending time with her two boys. They grow fruits in their Northern California home and cook all together in the French tradition.

profile on LinkedIn

Follow me on Twitter

Filtering to Gain Social Network Value — Image by Intersection Consulting via Flickr

Social Networks Hype Cycle — Image by fredcavazza via Flickr

Business Analytics Predictions from Gartner and Forrester (readwriteweb.com)
5 Big Themes in BI for 2011 (informationweek.com)
Unfinished Business: Questions Social Media Must Answer In 2011 – Liza With A “Z” (lizasperling.com)
Gartner Predicts Top 10 Technologies for 2011 (marketingtechblog.com)
How Will Technology Disrupt the Enterprise in 2011? (readwriteweb.com)
Social ecosystems tracked by Xeesm social graphs (wendysoucie.com)
Social media can enhance the buying decision (customerthink.com)
Top Ten Predictions for 2011 (enterpriseirregulars.com)
Open Data: Why the Crowd Can Be Your Best Analytics Tool (mashable.com)
Closing the Loop: NCDM, the Future of Customer Interactions and Maturing Social Marketing Practices (customerthink.com)
Your 2011 Social Media Quick Start! (socialmediadudes.com)
A Sure Sign News Teases Don’t Work in Social Media (journalistics.com)
Social Media Strategists Look Hard At ROI This Year (webguild.org)

Bayesian	Bayesian Inference
ChemPhys	Chemometrics and Computational Physics
ClinicalTrials	Clinical Trial Design, Monitoring, and Analysis
Cluster	Cluster Analysis & Finite Mixture Models
Distributions	Probability Distributions
Econometrics	Computational Econometrics
Environmetrics	Analysis of Ecological and Environmental Data
ExperimentalDesign	Design of Experiments (DoE) & Analysis of Experimental Data
Finance	Empirical Finance
Genetics	Statistical Genetics
Graphics	Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization
gR	gRaphical Models in R
HighPerformanceComputing	High-Performance and Parallel Computing with R
MachineLearning	Machine Learning & Statistical Learning
MedicalImaging	Medical Image Analysis
Multivariate	Multivariate Statistics
NaturalLanguageProcessing	Natural Language Processing
OfficialStatistics	Official Statistics & Survey Methodology
Optimization	Optimization and Mathematical Programming
Pharmacokinetics	Analysis of Pharmacokinetic Data
Phylogenetics	Phylogenetics, Especially Comparative Methods
Psychometrics	Psychometric Models and Methods
ReproducibleResearch	Reproducible Research
Robust	Robust Statistical Methods
SocialSciences	Statistics for the Social Sciences
Spatial	Analysis of Spatial Data
Survival	Survival Analysis
TimeSeries	Time Series Analysis

CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

Related Articles

Please share:

Related Articles

Please share:

Heritage Health Prize Launching April 4

Prize Goal & Participation

Data Sets

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Here’s how it works:

It’s that simple.

Related Articles

Please share:

Related Articles

Please share:

Heritage Health Prize
Launching April 4