Revolution R Enterprise 6.0 launched!

Just got the email-more software is good news!

Revolution R Enterprise 6.0 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster or IBM Platform Computing LSF cluster.  RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user.

Detailed information on what’s new in 6.0 and known issues:
http://www.revolutionanalytics.com/doc/README_RevoEnt_Windows_6.0.0.pdf

and from the manual-lots of function goodies for Big Data

 

  • IBM Platform LSF Cluster support [Linux only]. The new RevoScaleR function, RxLsfCluster, allows you to create a distributed compute context for the Platform LSF workload manager.
  •  Azure Burst support added for Microsoft HPC Server [Windows only]. The new RevoScaleR function, RxAzureBurst, allows you to create a distributed compute context to have computations performed in the cloud using Azure Burst
  • The rxExec function allows distributed execution of essentially any R function across cores and nodes, delivering the results back to the user.
  • functions RxLocalParallel and RxLocalSeq allow you to create compute context objects for local parallel and local sequential computation, respectively.
  • RxForeachDoPar allows you to create a compute context using the currently registered foreach parallel backend (doParallel, doSNOW, doMC, etc.). To execute rxExec calls, simply register the parallel backend as usual, then set your compute context as follows: rxSetComputeContext(RxForeachDoPar())
  • rxSetComputeContext and rxGetComputeContext simplify management of compute contexts.
  • rxGlm, provides a fast, scalable, distributable implementation of generalized linear models. This expands the list of full-featured high performance analytics functions already available: summary statistics (rxSummary), cubes and cross tabs (rxCube,rxCrossTabs), linear models (rxLinMod), covariance and correlation matrices (rxCovCor),
    binomial logistic regression (rxLogit), and k-means clustering (rxKmeans)example: a Tweedie family with 1 million observations and 78 estimated coefficients (categorical data)
    took 17 seconds with rxGlm compared with 377 seconds for glm on a quadcore laptop

     

    and easier working with R’s big brother SAS language

     

    RevoScaleR high-performance analysis functions will now conveniently work directly with a variety of external data sources (delimited and fixed format text files, SAS files, SPSS files, and ODBC data connections). New functions are provided to create data source objects to represent these data sources (RxTextData, RxOdbcData, RxSasData, and RxSpssData), which in turn can be specified for the ‘data’ argument for these RevoScaleR analysis functions: rxHistogramrxSummary, rxCube, rxCrossTabs, rxLinMod, rxCovCor, rxLogit, and rxGlm.


    example, 

    you can analyze a SAS file directly as follows:


    # Create a SAS data source with information about variables and # rows to read in each chunk

    sasDataFile <- file.path(rxGetOption(“sampleDataDir”),”claims.sas7bdat”)
    sasDS <- RxSasData(sasDataFile, stringsAsFactors = TRUE,colClasses = c(RowNum = “integer”),rowsPerRead = 50)

    # Compute and draw a histogram directly from the SAS file
    rxHistogram( ~cost|type, data = sasDS)
    # Compute summary statistics
    rxSummary(~., data = sasDS)
    # Estimate a linear model
    linModObj <- rxLinMod(cost~age + car_age + type, data = sasDS)
    summary(linModObj)
    # Import a subset into a data frame for further inspection
    subData <- rxImport(inData = sasDS, rowSelection = cost > 400,
    varsToKeep = c(“cost”, “age”, “type”))
    subData

 

The installation instructions and instructions for getting started with Revolution R Enterprise & RevoDeployR for Windows: http://www.revolutionanalytics.com/downloads/instructions/windows.php

Product Review – Revolution R 5.0

So I got the email from Revolution R. Version 5.0 is ready for download, and unlike half hearted attempts by many software companies they make it easy for the academics and researchers to get their free copy. Free as in speech and free as in beer.

Some thoughts-

1) R ‘s memory problem is now an issue of marketing and branding. Revolution Analytics has definitely bridged this gap technically  beautifully and I quote from their documentation-

The primary advantage 64-bit architectures bring to R is an increase in the amount of memory available to a given R process.
The first benefit of that increase is an increase in the size of data objects you can create. For example, on most 32-bit versions of R, the largest data object you can create is roughly 3GB; attempts to create 4GB objects result in errors with the message “cannot allocate vector of length xxxx.”
On 64-bit versions of R, you can generally create larger data objects, up to R’s current hard limit of 231 􀀀 1 elements in a vector (about 2 billion elements). The functions memory.size and memory.limit help you manage the memory used byWindows versions of R.
In 64-bit Revolution R Enterprise, R sets the memory limit by default to the amount of physical RAM minus half a gigabyte, so that, for example, on a machine with 8GB of RAM, the default memory limit is 7.5GB:

2) The User Interface is best shown as below or at  https://docs.google.com/presentation/pub?id=1V_G7r0aBR3I5SktSOenhnhuqkHThne6fMxly_-4i8Ag&start=false&loop=false&delayms=3000

-(but I am still hoping for the GUI ,Revolution Analytics promised us for Christmas)

3) The partnership with Microsoft HPC is quite awesome given Microsoft’s track record in enterprise software penetration

but I am also interested in knowing more about the Oracle version of R and what it will do there.

Interview Markus Schmidberger ,Cloudnumbers.com

Here is an interview with Markus Schmidberger, Senior Community Manager for cloudnumbers.com. Cloudnumbers.com is the exciting new cloud startup for scientific computing. It basically enables transition to a R and other platforms in the cloud and makes it very easy and secure from the traditional desktop/server model of operation.

Ajay- Describe the startup story for setting up Cloudnumbers.com

Markus- In 2010 the company founders Erik Muttersbach (TU München), Markus Fensterer (TU München) and Moritz v. Petersdorff-Campen (WHU Vallendar) started with the development of the cloud computing environment. Continue reading “Interview Markus Schmidberger ,Cloudnumbers.com”

Amazon Ec2 goes Red Hat

message from Amazing Amazon’s cloud team- this will also help for #rstats users given that revolution Analytics full versions on RHEL.

—————————————————-

on-demand instances of Amazon EC2 running Red Hat Enterprise Linux (RHEL) for as little as $0.145 per instance hour. The offering combines the cost-effectiveness, scalability and flexibility of running in Amazon EC2 with the proven reliability of Red Hat Enterprise Linux.

Highlights of the offering include:

  • Support is included through subscription to AWS Premium Support with back-line support by Red Hat
  • Ongoing maintenance, including security patches and bug fixes, via update repositories available in all Amazon EC2 regions
  • Amazon EC2 running RHEL currently supports RHEL 5.5, RHEL 5.6, RHEL 6.0 and RHEL 6.1 in both 32 bit and 64 bit formats, and is available in all Regions.
  • Customers who already own Red Hat licenses will continue to be able to use those licenses at no additional charge.
  • Like all services offered by AWS, Amazon EC2 running Red Hat Enterprise Linux offers a low-cost, pay-as-you-go model with no long-term commitments and no minimum fees.

For more information, please visit the Amazon EC2 Red Hat Enterprise Linux page.

which is

Amazon EC2 Running Red Hat Enterprise Linux

Amazon EC2 running Red Hat Enterprise Linux provides a dependable platform to deploy a broad range of applications. By running RHEL on EC2, you can leverage the cost effectiveness, scalability and flexibility of Amazon EC2, the proven reliability of Red Hat Enterprise Linux, and AWS premium support with back-line support from Red Hat.. Red Hat Enterprise Linux on EC2 is available in versions 5.5, 5.6, 6.0, and 6.1, both in 32-bit and 64-bit architectures.

Amazon EC2 running Red Hat Enterprise Linux provides seamless integration with existing Amazon EC2 features including Amazon Elastic Block Store (EBS), Amazon CloudWatch, Elastic-Load Balancing, and Elastic IPs. Red Hat Enterprise Linux instances are available in multiple Availability Zones in all Regions.

Sign Up

Pricing

Pay only for what you use with no long-term commitments and no minimum fee.

On-Demand Instances

On-Demand Instances let you pay for compute capacity by the hour with no long-term commitments.

Region:US – N. VirginiaUS – N. CaliforniaEU – IrelandAPAC – SingaporeAPAC – Tokyo
Standard Instances Red Hat Enterprise Linux
Small (Default) $0.145 per hour
Large $0.40 per hour
Extra Large $0.74 per hour
Micro Instances Red Hat Enterprise Linux
Micro $0.08 per hour
High-Memory Instances Red Hat Enterprise Linux
Extra Large $0.56 per hour
Double Extra Large $1.06 per hour
Quadruple Extra Large $2.10 per hour
High-CPU Instances Red Hat Enterprise Linux
Medium $0.23 per hour
Extra Large $0.78 per hour
Cluster Compute Instances Red Hat Enterprise Linux
Quadruple Extra Large $1.70 per hour
Cluster GPU Instances Red Hat Enterprise Linux
Quadruple Extra Large $2.20 per hour

Pricing is per instance-hour consumed for each instance type. Partial instance-hours consumed are billed as full hours.

↑ Top

and

Available Instance Types

Standard Instances

Instances of this family are well suited for most applications.

Small Instance – default*

1.7 GB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
160 GB instance storage
32-bit platform
I/O Performance: Moderate
API name: m1.small

Large Instance

7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large

Extra Large Instance

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.xlarge

Micro Instances

Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically.

Micro Instance

613 MB memory
Up to 2 EC2 Compute Units (for short periodic bursts)
EBS storage only
32-bit or 64-bit platform
I/O Performance: Low
API name: t1.micro

High-Memory Instances

Instances of this family offer large memory sizes for high throughput applications, including database and memory caching applications.

High-Memory Extra Large Instance

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

High-Memory Double Extra Large Instance

34.2 GB of memory
13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each)
850 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.2xlarge

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: m2.4xlarge

High-CPU Instances

Instances of this family have proportionally more CPU resources than memory (RAM) and are well suited for compute-intensive applications.

High-CPU Medium Instance

1.7 GB of memory
5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each)
350 GB of instance storage
32-bit platform
I/O Performance: Moderate
API name: c1.medium

High-CPU Extra Large Instance

7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge

Cluster Compute Instances

Instances of this family provide proportionally high CPU resources with increased network performance and are well suited for High Performance Compute (HPC) applications and other demanding network-bound applications. Learn more about use of this instance type for HPC applications.

Cluster Compute Quadruple Extra Large Instance

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

Cluster GPU Instances

Instances of this family provide general-purpose graphics processing units (GPUs) with proportionally high CPU and increased network performance for applications benefitting from highly parallelized processing, including HPC, rendering and media processing applications. While Cluster Compute Instances provide the ability to create clusters of instances connected by a low latency, high throughput network, Cluster GPU Instances provide an additional option for applications that can benefit from the efficiency gains of the parallel computing power of GPUs over what can be achieved with traditional processors. Learn more about use of this instance type for HPC applications.

Cluster GPU Quadruple Extra Large Instance

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

 


Getting Started

To get started using Red Hat Enterprise Linux on Amazon EC2, perform the following steps:

  • Open and log into the AWS Management Console
  • Click on Launch Instance from the EC2 Dashboard
  • Select the Red Hat Enterprise Linux AMI from the QuickStart tab
  • Specify additional details of your instance and click Launch
  • Additional details can be found on each AMI’s Catalog Entry page

The AWS Management Console is an easy tool to start and manage your instances. If you are looking for more details on launching an instance, a quick video tutorial on how to use Amazon EC2 with the AWS Management Console can be found here .
A full list of Red Hat Enterprise Linux AMIs can be found in the AWS AMI Catalog.

↑ Top


Support

All customers running Red Hat Enterprise Linux on EC2 will receive access to repository updates from Red Hat. Moreover, AWS Premium support customers can contact AWS to get access to a support structure from both Amazon and Red Hat.

↑ Top


Resources

↑ Top


About Red Hat

Red Hat, the world’s leading open source solutions provider, is headquartered in Raleigh, NC with over 50 satellite offices spanning the globe. Red Hat provides high-quality, low-cost technology with its operating system platform, Red Hat Enterprise Linux, together with applications, management and Services Oriented Architecture (SOA) solutions, including the JBoss Enterprise Middleware Suite. Red Hat also offers support, training and consulting services to its customers worldwide.

 

also from Revolution Analytics- in case you want to #rstats in the cloud and thus kill all that talk of RAM dependency, slow R than other softwares (just increase the RAM above in the instances to keep it simple)

,or Revolution not being open enough

http://www.revolutionanalytics.com/downloads/gpl-sources.php

GPL SOURCES

Revolution Analytics uses an Open-Core Licensing model. We provide open- source R bundled with proprietary modules from Revolution Analytics that provide additional functionality for our users. Open-source R is distributed under the GNU Public License (version 2), and we make our software available under a commercial license.

Revolution Analytics respects the importance of open source licenses and has contributed code to the open source R project and will continue to do so. We have carefully reviewed our compliance with GPLv2 and have worked with Mark Radcliffe of DLA Piper, the outside General Legal Counsel of the Open Source Initiative, to ensure that we fully comply with the obligations of the GPLv2.

For our Revolution R distribution, we may make some minor modifications to the R sources (the ChangeLog file lists all changes made). You can download these modified sources of open-source R under the terms of the GPLv2, using either the links below or those in the email sent to you when you download a specific version of Revolution R.

Download GPL Sources

Product Version Platform Modified R Sources
Revolution R Community 3.2 Windows R 2.10.1
Revolution R Community 3.2 MacOS R 2.10.1
Revolution R Enterprise 3.1.1 RHEL R 2.9.2
Revolution R Enterprise 4.0 Windows R 2.11.1
Revolution R Enterprise 4.0.1 RHEL R 2.11.1
Revolution R Enterprise 4.1.0 Windows R 2.11.1
Revolution R Enterprise 4.2 Windows R 2.11.1
Revolution R Enterprise 4.2 RHEL R 2.11.1
Revolution R Enterprise 4.3 Windows & RHEL R 2.12.2

 

 

 

Revolution releases R Windows for Academics for free

Logo for R
Image via Wikipedia

Based on the official email from them, God bless the merry coders at Revo-

Revolution Analytics has just released Revolution R Enterprise 4.3 for 32-bit and 64-bit Windows, a significant step forward in enterprise data analytics.  It features an updated RevoScaleR package for scalable, fast (multicore), and extensible data analysis with R. Revolution R Enterprise 4.3 for Windows also provides R 2.12.2, and includes an enhanced R Productivity Environment (RPE), a full-featured integrated development environment with visual debugging capabilities. Also available is an updated Windows release of our deployment server solution, RevoDeployR 1.2, designed to help you deliver R analytics via the Web.

As a registered user of the Academic version of Revolution R Enterprise for Windows, you can take advantage of these improvements by downloading and installing Revolution R Enterprise 4.3 today. You can install Revolution R Enterprise 4.3 side-by-side with your existing Revolution R Enterprise installations; there is no need to uninstall previous versions.

 

High Performance Analytics

Marry Big Data Analytics to High Performance Computing, and you get the buzzword of this season- High Performance Analytics.

It basically consists of Parallelized code to run in parallel on custom hardware, in -database analytics for speed, and cloud computing /high performance computing environments. On an operational level, it consists of software (as in analytics) partnering with software (as in databases, Map reduce, Hadoop) plus some hardware (HP or IBM mostly). It is considered a high margin , highly profitable, business with small number of deals compared to say desktop licenses.

As per HPC Wire- which is a great tool/newsletter to keep updated on HPC , SAS Institute has been busy on this front partnering with EMC Greenplum and TeraData (who also acquired  SAS Partner AsterData to gain a much needed foot in the MR/SQL space) Continue reading “High Performance Analytics”

Amazon goes HPC and GPU: Dirk E to revise his R HPC book

Looking south above Interstate 80, the Eastsho...
Image via Wikipedia

Amazon just did a cluster Christmas present for us tech geek lizards- before Google could out doogle them with end of the Betas (cough- its on NDA)

Clusters used by Academic Departments now have a great chance to reduce cost without downsizing- but only if the CIO gets the email.

While Professor Goodnight of SAS / North Carolina University is still playing time sharing versus mind sharing games with analytical birdies – his 70 mill server farm set in Feb last is about to get ready

( I heard they got public subsidies for environment- but thats historic for SAS– taking public things private -right Prof as SAS itself began as a publicly funded project. and that was in the 1960s and they didnt even have no lobbyists as well. )

In realted R news, Dirk E has been thinking of a R HPC book without paying attention to Amazon but would now have to include Amazon

(he has been thinking of writing that book for 5 years, but hey he’s got a day job, consulting gigs with revo, photo ops at Google, a blog, packages to maintain without binaries, Dirk E we await thy book with bated holes.

Whos Dirk E – well http://dirk.eddelbuettel.com/ is like the Terminator of R project (in terms of unpronounceable surnames)

Back to the cause du jeure-

 

From http://aws.amazon.com/ec2/hpc-applications/ but minus corporate buzz words.

 

Unique to Cluster Compute and Cluster GPU instances is the ability to group them into clusters of instances for use with HPC

applications. This is particularly valuable for those applications that rely on protocols like Message Passing Interface (MPI) for tightly coupled inter-node communication.

Cluster Compute and Cluster GPU instances function just like other Amazon EC2 instances but also offer the following features for optimal performance with HPC applications:

  • When run as a cluster of instances, they provide low latency, full bisection 10 Gbps bandwidth between instances. Cluster sizes up through and above 128 instances are supported.
  • Cluster Compute and Cluster GPU instances include the specific processor architecture in their definition to allow developers to tune their applications by compiling applications for that specific processor architecture in order to achieve optimal performance.

The Cluster Compute instance family currently contains a single instance type, the Cluster Compute Quadruple Extra Large with the following specifications:

23 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cc1.4xlarge

The Cluster GPU instance family currently contains a single instance type, the Cluster GPU Quadruple Extra Large with the following specifications:

22 GB of memory
33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture)
2 x NVIDIA Tesla “Fermi” M2050 GPUs
1690 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
API name: cg1.4xlarge

.

Sign Up for Amazon EC2