R now part of Amazon Linux AMI

Based on this post, Amazon now had decided to bundle R with Amazon Linux AMI


R 2.15: Also coming from your requests, we have added the R language to the Amazon Linux AMI.  We are here to serve your statistical analysis needs!  Simply yum install R and off you go.

ps- back to work. sorry for the delayed posts . I am working on book 2 for Springer- “R for Cloud Computing” . If you have any case studies of R on Amazon,Google, Oracle or Azure clouds please let me know.

pps- With 48 mb, is R too big to bundle in the many default Linux distros . Thoughts?


Running R on Windows Azure #rstats #cloud

Here is a brief tutorial for people to run R on Windows Azure Cloud (OS=Windows in this case , but there are 4 kinds of Linux also available)

There is a free 90 day trial so you can run R for free on the cloud for free (since Google Cloud Compute is still in closed hush hush beta)

Go to https://www.windowsazure.com/en-us/pricing/free-trial/

New Amazon Instance: High I/O for NoSQL

Latest from the Amazon Cloud-

hi1.4xlarge instances come with eight virtual cores that can deliver 35 EC2 Compute Units (ECUs) of CPU performance, 60.5 GiB of RAM, and 2 TiB of storage capacity across two SSD-based storage volumes. Customers using hi1.4xlarge instances for their applications can expect over 120,000 4 KB random write IOPS, and as many as 85,000 random write IOPS (depending on active LBA span). These instances are available on a 10 Gbps network, with the ability to launch instances into cluster placement groups for low-latency, full-bisection bandwidth networking.

High I/O instances are currently available in three Availability Zones in US East (N. Virginia) and two Availability Zones in EU West (Ireland) regions. Other regions will be supported in the coming months. You can launch hi1.4xlarge instances as On Demand instances starting at $3.10/hour, and purchase them as Reserved Instances


High I/O Instances

Instances of this family provide very high instance storage I/O performance and are ideally suited for many high performance database workloads. Example applications include NoSQL databases like Cassandra and MongoDB. High I/O instances are backed by Solid State Drives (SSD), and also provide high levels of CPU, memory and network performance.

High I/O Quadruple Extra Large Instance

60.5 GB of memory
35 EC2 Compute Units (8 virtual cores with 4.4 EC2 Compute Units each)
2 SSD-based volumes each with 1024 GB of instance storage
64-bit platform
I/O Performance: Very High (10 Gigabit Ethernet)
Storage I/O Performance: Very High*
API name: hi1.4xlarge

*Using Linux paravirtual (PV) AMIs, High I/O Quadruple Extra Large instances can deliver more than 120,000 4 KB random read IOPS and between 10,000 and 85,000 4 KB random write IOPS (depending on active logical block addressing span) to applications. For hardware virtual machines (HVM) and Windows AMIs, performance is approximately 90,000 4 KB random read IOPS and between 9,000 and 75,000 4 KB random write IOPS. The maximum sequential throughput on all AMI types (Linux PV, Linux HVM, and Windows) per second is approximately 2 GB read and 1.1 GB write.

Revolution R Enterprise 6.0 launched!

Just got the email-more software is good news!

Revolution R Enterprise 6.0 for 32-bit and 64-bit Windows and 64-bit Red Hat Enterprise Linux (RHEL 5.x and RHEL 6.x) features an updated release of the RevoScaleR package that provides fast, scalable data management and data analysis: the same code scales from data frames to local, high-performance .xdf files to data distributed across a Windows HPC Server cluster or IBM Platform Computing LSF cluster.  RevoScaleR also allows distribution of the execution of essentially any R function across cores and nodes, delivering the results back to the user.

Detailed information on what’s new in 6.0 and known issues:

and from the manual-lots of function goodies for Big Data


  • IBM Platform LSF Cluster support [Linux only]. The new RevoScaleR function, RxLsfCluster, allows you to create a distributed compute context for the Platform LSF workload manager.
  •  Azure Burst support added for Microsoft HPC Server [Windows only]. The new RevoScaleR function, RxAzureBurst, allows you to create a distributed compute context to have computations performed in the cloud using Azure Burst
  • The rxExec function allows distributed execution of essentially any R function across cores and nodes, delivering the results back to the user.
  • functions RxLocalParallel and RxLocalSeq allow you to create compute context objects for local parallel and local sequential computation, respectively.
  • RxForeachDoPar allows you to create a compute context using the currently registered foreach parallel backend (doParallel, doSNOW, doMC, etc.). To execute rxExec calls, simply register the parallel backend as usual, then set your compute context as follows: rxSetComputeContext(RxForeachDoPar())
  • rxSetComputeContext and rxGetComputeContext simplify management of compute contexts.
  • rxGlm, provides a fast, scalable, distributable implementation of generalized linear models. This expands the list of full-featured high performance analytics functions already available: summary statistics (rxSummary), cubes and cross tabs (rxCube,rxCrossTabs), linear models (rxLinMod), covariance and correlation matrices (rxCovCor),
    binomial logistic regression (rxLogit), and k-means clustering (rxKmeans)example: a Tweedie family with 1 million observations and 78 estimated coefficients (categorical data)
    took 17 seconds with rxGlm compared with 377 seconds for glm on a quadcore laptop


    and easier working with R’s big brother SAS language


    RevoScaleR high-performance analysis functions will now conveniently work directly with a variety of external data sources (delimited and fixed format text files, SAS files, SPSS files, and ODBC data connections). New functions are provided to create data source objects to represent these data sources (RxTextData, RxOdbcData, RxSasData, and RxSpssData), which in turn can be specified for the ‘data’ argument for these RevoScaleR analysis functions: rxHistogramrxSummary, rxCube, rxCrossTabs, rxLinMod, rxCovCor, rxLogit, and rxGlm.


    you can analyze a SAS file directly as follows:

    # Create a SAS data source with information about variables and # rows to read in each chunk

    sasDataFile <- file.path(rxGetOption(“sampleDataDir”),”claims.sas7bdat”)
    sasDS <- RxSasData(sasDataFile, stringsAsFactors = TRUE,colClasses = c(RowNum = “integer”),rowsPerRead = 50)

    # Compute and draw a histogram directly from the SAS file
    rxHistogram( ~cost|type, data = sasDS)
    # Compute summary statistics
    rxSummary(~., data = sasDS)
    # Estimate a linear model
    linModObj <- rxLinMod(cost~age + car_age + type, data = sasDS)
    # Import a subset into a data frame for further inspection
    subData <- rxImport(inData = sasDS, rowSelection = cost > 400,
    varsToKeep = c(“cost”, “age”, “type”))


The installation instructions and instructions for getting started with Revolution R Enterprise & RevoDeployR for Windows: http://www.revolutionanalytics.com/downloads/instructions/windows.php

Oracle R Updated!

Interesting message from https://blogs.oracle.com/R/ the latest R blog



Oracle just released the latest update to Oracle R Enterprise, version 1.1. This release includes the Oracle R Distribution (based on open source R, version 2.13.2), an improved server installation, and much more.  The key new features include:

  • Extended Server Support: New support for Windows 32 and 64-bit server components, as well as continuing support for Linux 64-bit server components
  • Improved Installation: Linux 64-bit server installation now provides robust status updates and prerequisite checks
  • Performance Improvements: Improved performance for embedded R script execution calculations

In addition, the updated ROracle package, which is used with Oracle R Enterprise, now reads date data by conversion to character strings.

We encourage you download Oracle software for evaluation from the Oracle Technology Network. See these links for R-related software: Oracle R DistributionOracle R EnterpriseROracleOracle R Connector for Hadoop.  As always, we welcome comments and questions on the Oracle R Forum.



Oracle R Distribution 2-13.2 Update Available

Oracle has released an update to the Oracle R Distribution, an Oracle-supported distribution of open source R. Oracle R Distribution 2-13.2 now contains the ability to dynamically link the following libraries on both Windows and Linux:

  • The Intel Math Kernel Library (MKL) on Intel chips
  • The AMD Core Math Library (ACML) on AMD chips


To take advantage of the performance enhancements provided by Intel MKL or AMD ACML in Oracle R Distribution, simply add the MKL or ACML shared library directory to the LD_LIBRARY_PATH system environment variable. This automatically enables MKL or ACML to make use of all available processors, vastly speeding up linear algebra computations and eliminating the need to recompile R.  Even on a single core, the optimized algorithms in the Intel MKL libraries are faster than using R’s standard BLAS library.

Open-source R is linked to NetLib’s BLAS libraries, but they are not multi-threaded and only use one core. While R’s internal BLAS are efficient for most computations, it’s possible to recompile R to link to a different, multi-threaded BLAS library to improve performance on eligible calculations. Compiling and linking to R yourself can be involved, but for many, the significantly improved calculation speed justifies the effort. Oracle R Distribution notably simplifies the process of using external math libraries by enabling R to auto-load MKL orACML. For R commands that don’t link to BLAS code, taking advantage of database parallelism usingembedded R execution in Oracle R Enterprise is the route to improved performance.

For more information about rebuilding R with different BLAS libraries, see the linear algebra section in the R Installation and Administration manual. As always, the Oracle R Distribution is available as a free download to anyone. Questions and comments are welcome on the Oracle R Forum.

How to add or change the %PATH variable in Windows 7

See this for a simple 5 step way to add or change the %PATH variable in Windows 7 if you need to install an application that shows error while installing (because that APP was built for Linux based systems… )