R on Windows HPC Server

From HPC Wire, the newsletter/site for all HPC news-

Source- Link

PALO ALTO, Calif., Sept. 20 — Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language, today announced it will deliver Revolution R Enterprise for Microsoft Windows HPC Server 2008 R2, released today, enabling users to analyze very large data sets in high-performance computing environments.

R is a powerful open source statistics language and the modern system for predictive analytics. Revolution Analytics recently introduced RevoScaleR, new “Big Data” analysis capabilities, to its R distribution, Revolution R Enterprise. RevoScaleR solves the performance and capacity limitations of the R language by with parallelized algorithms that stream data across multiple cores on a laptop, workstation or server. Users can now process, visualize and model terabyte-class data sets at top speeds — without the need for specialized hardware.

“Revolution Analytics is pleased to support Microsoft’s Technical Computing initiative, whose efforts will benefit scientists, engineers and data analysts,” said David Champagne, CTO at Revolution. “We believe the engineering we have done for Revolution R Enterprise, in particular our work on big-data statistics and multicore computing, along with Microsoft’s HPC platform for technical computing, makes an ideal combination for high-performance large scale statistical computing.”

“Processing and analyzing this ‘big data’ is essential to better prediction and decision making,” said Bill Hamilton, director of technical computing at Microsoft Corp. “Revolution R Enterprise for Windows HPC Server 2008 R2 gives customers an extremely powerful tool that handles analysis of very large data and high workloads.”

To learn more about Revolution R Enterprise and its Big Data capabilities, download thewhite paper. Revolution Analytics also has an on-demand webcast, “High-performance analytics with Revolution R and Windows HPC Server,” available online.

AND from Microsoft’s website

http://www.microsoft.com/hpc/en/us/solutions/hpc-for-life-sciences.aspx

REvolution R Enterprise »

REvolution Computing

REvolution R Enterprise is designed for both novice and experienced R users looking for a production-grade R distribution to perform mission critical predictive analytics tasks right from the desktop and scale across multiprocessor environments. Featuring RPE™ REvolution’s R Productivity Environment for Windows.

Of course R Enterprise is available on Linux but on Red Hat Enterprise Linux- it would be nice to see Amazom Machine Images as well as Ubuntu versions as well.

An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2.[1]

Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]

The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.

An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.[3]

[edit]Types of images

  • Public: an AMI image that can be used by any one.
  • Paid: a for-pay AMI image that is registered with Amazon DevPay and can be used by any one who subscribes for it. DevPay allows developers to mark-up Amazon’s usage fees and optionally add monthly subscription fees.

Big Data and R: New Product Release by Revolution Analytics

Press Release by the Guys in Revolution Analytics- this time claiming to enable terabyte level analytics with R. Interesting stuff but techie details are awaited.

Revolution Analytics Brings

Big Data Analysis to R

The world’s most powerful statistics language can now tackle terabyte-class data sets using

Revolution R Enterpriseat a fraction of the cost of legacy analytics products


JSM 2010 – VANCOUVER (August 3, 2010) — Revolution Analytics today introduced ‘Big Data’ analysis to its Revolution R Enterprise software, taking the popular R statistics language to unprecedented new levels of capacity and performance for analyzing very large data sets. For the first time, R users will be able to process, visualize and model terabyte-class data sets in a fraction of the time of legacy products—without employing expensive or specialized hardware.

The new version of Revolution R Enterprise introduces an add-on package called RevoScaleR that provides a new framework for fast and efficient multi-core processing of large data sets. It includes:

  • The XDF file format, a new binary ‘Big Data’ file format with an interface to the R language that provides high-speed access to arbitrary rows, blocks and columns of data.
  • A collection of widely-used statistical algorithms optimized for Big Data, including high-performance implementations of Summary Statistics, Linear Regression, Binomial Logistic Regressionand Crosstabs—with more to be added in the near future.
  • Data Reading & Transformation tools that allow users to interactively explore and prepare large data sets for analysis.
  • Extensibility, expert R users can develop and extend their own statistical algorithms to take advantage of Revolution R Enterprise’s new speed and scalability capabilities.

“The R language’s inherent power and extensibility has driven its explosive adoption as the modern system for predictive analytics,” said Norman H. Nie, president and CEO of Revolution Analytics. “We believe that this new Big Data scalability will help R transition from an amazing research and prototyping tool to a production-ready platform for enterprise applications such as quantitative finance and risk management, social media, bioinformatics and telecommunications data analysis.”

Sage Bionetworks is the nonprofit force behind the open-source collaborative effort, Sage Commons, a place where data and disease models can be shared by scientists to better understand disease biology. David Henderson, Director of Scientific Computing at Sage, commented: “At Sage Bionetworks, we need to analyze genomic databases hundreds of gigabytes in size with R. We’re looking forward to using the high-speed data-analysis features of RevoScaleR to dramatically reduce the times it takes us to process these data sets.”

Take Hadoop and Other Big Data Sources to the Next Level

Revolution R Enterprise fits well within the modern ‘Big Data’ architecture by leveraging popular sources such as Hadoop, NoSQL or key value databases, relational databases and data warehouses. These products can be used to store, regularize and do basic manipulation on very large datasets—while Revolution R Enterprise now provides advanced analytics at unparalleled speed and scale: producing speed on speed.

“Together, Hadoop and R can store and analyze massive, complex data,” said Saptarshi Guha, developer of the popular RHIPE R package that integrates the Hadoop framework with R in an automatically distributed computing environment. “Employing the new capabilities of Revolution R Enterprise, we will be able to go even further and compute Big Data regressions and more.”

Platforms and Availability

The new RevoScaleR package will be delivered as part of Revolution R Enterprise 4.0, which will be available for 32-and 64-bit Microsoft Windows in the next 30 days. Support for Red Hat Enterprise Linux (RHEL 5) is planned for later this year.

On its website (http://www.revolutionanalytics.com/bigdata), Revolution Analytics has published performance and scalability benchmarks for Revolution R Enterprise analyzing a 13.2 gigabyte data set of commercial airline information containing more than 123 million rows, and 29 columns.

Additionally, the company will showcase its new Big Data solution in a free webinar on August 25 at 9:00 a.m. Pacific.

Additional Resources

•      Big Data Benchmark whitepaper

•      The Revolution Analytics Roadmap whitepaper

•      Revolutions Blog

•      Download free academic copy of Revolution R Enterprise

•      Visit Inside-R.org for the most comprehensive set of information on R

•      Spread the word: Add a “Download R!” badge on your website

•      Follow @RevolutionR on Twitter

About Revolution Analytics

Revolution Analytics (http://www.revolutionanalytics.com) is the leading commercial provider of software and support for the popular open source R statistics language. Its Revolution R products help make predictive analytics accessible to every type of user and budget. The company is headquartered in Palo Alto, Calif. and backed by North Bridge Venture Partners and Intel Capital.

Media Contact

Chantal Yang
Page One PR, for Revolution Analytics
Tel: +1 415-875-7494

Email:  revolution@pageonepr.com

Interview:Richard Schultz , CEO REvolution Computing

Here is an interview with the CEO of REvolution Computing, Richard Schultz. Mr. Schultz offers his perspectives on aspects of the open source, predictive analytics, cloud computing as well his vision for R Commercial.

Note from Ajay-As I blogged previously, commercial establishments now have an option to use R commercially with a full service contract and all guarantees which they expect and get from existing analytics software vendors.

Ajay -Linux has not really succeeded in capturing Windows /Desktop Operating market. What are the technical and business reasons that you think R will succeed in analytics desktop software market.

Richard- To start, Linux was never really targeted at the Windows desktop market, but rather at deseating proprietary Unix deployments (particularly in finance), which it did quite successfully.  This is a similar trend to what we’re seeing in the R world – it’s not that R is generally replacing Excel, for instance.  In addition, with the large and growing base of both users and contributors, the vibrancy of the R community has taken on a life of its own.

As to R and Windows, two things are worth noting:

1. Microsoft has moved rapidly to embrace R and REvolution for that matter.

2. Windows is still the predominate operating system in large commercial enterprises. Because we deploy R on multiprocessors, which are now common on all computers including those pre-loaded with Windows, REvolution R is very much at home in both Windows, Mac, and Linux environments.

Ajay- What are the biggest challenges to Revolution Computing while explaining R Pro to users of traditional statistics softwares. What are the biggest advantages?

Richard- The biggest challenge is getting the word out that there now exists validated and supported R products designed for commercial use. But that’s changing rapidly, as your own interest in REvolution Computing demonstrates. Our biggest advantages are several:

1. we are focused on building a close and collegial relationship with the open source R community;

2. our company has a deep history in super computing and parallelization;

3. with, by Intel’s estimate, over 1 million R users and growing, there is a large community eager to adapt our products as its members advance their careers in the business and research worlds.

Ajay- Which softwares do you think will be affected the most by R’s spread across colleges and companies. What do you believe will be their strategies to compete.


Richard – I want to be politic here. Let me say that the programming software likely most affected by the rise of R is probably proprietary.

We see many opportunities to partner and leverage the strengths of REvolution’s products specifically – high performance, handling of large data, validation, IDE / user interface.

Ajay- How do you intend to incorporate the cloud computing and Software as a Service Model for R Pro. When , if at all, do you think it be possible  for a person to simply upload a zipped csv file, work on a remote cloud computer for analytics and forecasting, and just pay for the hired software,hardware,bandwidth.

Richard – We were thinking of something based on the Ohri framework.  ;-). ( Ajay- Touché!)

In fact, we have deployed, and are deploying cloud-based REvolution R for clients, and it’s something we expect to evolve as those technologies evolve.


Ajay- Asian countries have huge demand for analytics, and are more price conscious on softwares. What would your strategy to sell in Asia /China and India be.

Richard – Open source can be a tremendous win for users in Asia / China / India.  The upfront costs are low, the technology is leading-edge, and there is a distribution network for support.  REvolution has partners, and is continuing to build its partner network to be able to reach these markets.  We expect to accelerate our efforts in these regions toward the end of 2009.

Ajay- What has been the story so far for your career. What prompted you to join/start Revolution Computing. What would be the advice you would give to young science graduates in today’s recession.

Richard – My own background is in computer science, business… and music. Through school I held various positions at IBM, and after graduate school, I worked at Dunn & Bradsteet in a product management role and developed a taste for entrepreneurship. I’ve started two companies so far, MetaServer, a business intelligence middleware company that catered to the insurance industry, and REvolution Computing. Today, MetaServer is part of Oracle. And I continue to play music – guitar and piano. One of these days we’ll get a REvolution Computing band together.

My advice to young science graduates is the same recession or no: follow your enthusiasms; find a passion outside of work like playing music; master open source program languages because that is the future and the future is here.

About Richard Schultz –Chief Executive Officer,REvolution Computing

Richard guides REvolution’s long-range business strategy and leads the company’s teams on a daily basis. His experience developing and growing Business Intelligence software companies includes founding and leading Metaserver, Inc., now a part of Oracle, from inception to sale. Richard has been named Innovator of the Year by Business New Haven; served on the board of the Connecticut Venture Group; and been the keynote speaker for CIO Forum and other technology industry events.  A graduate of Washington University with degrees in Computer Science, Business and Music, Richard also holds a Masters degree in Computer Science from the State University of New York at Stonybrook and has held senior positions at Dunn and Bradstreet and IBM.

Ajay -REvolution Computing has been a leader in this field and going by the latest product launch –well you can try it yourself and see from here http://www.revolution-computing.com