Big Data and R: New Product Release by Revolution Analytics

Press Release by the Guys in Revolution Analytics- this time claiming to enable terabyte level analytics with R. Interesting stuff but techie details are awaited.

Revolution Analytics Brings

Big Data Analysis to R

The world’s most powerful statistics language can now tackle terabyte-class data sets using

Revolution R Enterpriseat a fraction of the cost of legacy analytics products


JSM 2010 – VANCOUVER (August 3, 2010) — Revolution Analytics today introduced ‘Big Data’ analysis to its Revolution R Enterprise software, taking the popular R statistics language to unprecedented new levels of capacity and performance for analyzing very large data sets. For the first time, R users will be able to process, visualize and model terabyte-class data sets in a fraction of the time of legacy products—without employing expensive or specialized hardware.

The new version of Revolution R Enterprise introduces an add-on package called RevoScaleR that provides a new framework for fast and efficient multi-core processing of large data sets. It includes:

  • The XDF file format, a new binary ‘Big Data’ file format with an interface to the R language that provides high-speed access to arbitrary rows, blocks and columns of data.
  • A collection of widely-used statistical algorithms optimized for Big Data, including high-performance implementations of Summary Statistics, Linear Regression, Binomial Logistic Regressionand Crosstabs—with more to be added in the near future.
  • Data Reading & Transformation tools that allow users to interactively explore and prepare large data sets for analysis.
  • Extensibility, expert R users can develop and extend their own statistical algorithms to take advantage of Revolution R Enterprise’s new speed and scalability capabilities.

“The R language’s inherent power and extensibility has driven its explosive adoption as the modern system for predictive analytics,” said Norman H. Nie, president and CEO of Revolution Analytics. “We believe that this new Big Data scalability will help R transition from an amazing research and prototyping tool to a production-ready platform for enterprise applications such as quantitative finance and risk management, social media, bioinformatics and telecommunications data analysis.”

Sage Bionetworks is the nonprofit force behind the open-source collaborative effort, Sage Commons, a place where data and disease models can be shared by scientists to better understand disease biology. David Henderson, Director of Scientific Computing at Sage, commented: “At Sage Bionetworks, we need to analyze genomic databases hundreds of gigabytes in size with R. We’re looking forward to using the high-speed data-analysis features of RevoScaleR to dramatically reduce the times it takes us to process these data sets.”

Take Hadoop and Other Big Data Sources to the Next Level

Revolution R Enterprise fits well within the modern ‘Big Data’ architecture by leveraging popular sources such as Hadoop, NoSQL or key value databases, relational databases and data warehouses. These products can be used to store, regularize and do basic manipulation on very large datasets—while Revolution R Enterprise now provides advanced analytics at unparalleled speed and scale: producing speed on speed.

“Together, Hadoop and R can store and analyze massive, complex data,” said Saptarshi Guha, developer of the popular RHIPE R package that integrates the Hadoop framework with R in an automatically distributed computing environment. “Employing the new capabilities of Revolution R Enterprise, we will be able to go even further and compute Big Data regressions and more.”

Platforms and Availability

The new RevoScaleR package will be delivered as part of Revolution R Enterprise 4.0, which will be available for 32-and 64-bit Microsoft Windows in the next 30 days. Support for Red Hat Enterprise Linux (RHEL 5) is planned for later this year.

On its website (http://www.revolutionanalytics.com/bigdata), Revolution Analytics has published performance and scalability benchmarks for Revolution R Enterprise analyzing a 13.2 gigabyte data set of commercial airline information containing more than 123 million rows, and 29 columns.

Additionally, the company will showcase its new Big Data solution in a free webinar on August 25 at 9:00 a.m. Pacific.

Additional Resources

•      Big Data Benchmark whitepaper

•      The Revolution Analytics Roadmap whitepaper

•      Revolutions Blog

•      Download free academic copy of Revolution R Enterprise

•      Visit Inside-R.org for the most comprehensive set of information on R

•      Spread the word: Add a “Download R!” badge on your website

•      Follow @RevolutionR on Twitter

About Revolution Analytics

Revolution Analytics (http://www.revolutionanalytics.com) is the leading commercial provider of software and support for the popular open source R statistics language. Its Revolution R products help make predictive analytics accessible to every type of user and budget. The company is headquartered in Palo Alto, Calif. and backed by North Bridge Venture Partners and Intel Capital.

Media Contact

Chantal Yang
Page One PR, for Revolution Analytics
Tel: +1 415-875-7494

Email:  revolution@pageonepr.com

R Oracle Data Mining

Here is a new package called R ODM and it is an interface to do Data Mining via Oracle Tables through R. You can read more here http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html and here http://cran.fhcrc.org/web/packages/RODM/RODM.pdf . Also there is a contest for creative use of R and ODM.

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

  • Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application
  • Scripting of “production” data mining methodologies
  • Customizing graphics of ODM data mining results (examples: classificationregressionanomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

and

Present and win an Apple iPod Touch!
The BI, Warehousing and Analytics (BIWA) SIG is giving an Apple iPOD Touch to the best new presenter. Be part of the TechCast series and get a chance to win!

Consider highlighting a creative use of R and ODM.

BIWA invites all Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) to submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community in our Wednesday TechCast series. Note that the contest is limited to new presenters to encourage fresh participation by the BIWA community.

Also an interview with Oracle Data Mining head, Charlie Berger https://decisionstats.wordpress.com/2009/09/02/oracle/

Protected: Analyzing SAS Institute-WPS Lawsuit

This content is password protected. To view it please enter your password below:

Using Red R- R with a Visual Interface

For people complaining about the GUI on R, here is the ah Enterprise Version of R called Red R.

It is available at the website at http://www.red-r.org/

 

You can read more there or just go through the short video created by them at

Basically it is a click and point method of using R with the ability to store schemas and thus very good for repeatable operations as well.


Not bad for epic software, huh?