Revolution Analytics has just released Revolution R Enterprise 4.0.1 for Red Hat Enterprise Linux, a significant step forward in enterprise data analytics. Revolution R Enterprise 4.0.1 is built on R 2.11.1, the latest release of the open-source environment for data analysis and graphics. Also available is the initial release of our deployment server solution, RevoDeployR 1.0, designed to help you deliver R analytics via the Web. And coming soon to Linux: RevoScaleR, a new package for fast and efficient multi-core processing of large data sets.
As a registered user of the Academic version of Revolution R Enterprise for Linux, you can take advantage of these improvements by downloading and installing Revolution R Enterprise 4.0.1 today. You can install Revolution R Enterprise 4.0.1 side-by-side with your existing Revolution R Enterprise installations; there is no need to uninstall previous versions.
The following information is all you will need to download and install the Academic Edition.
Revolution R Enterprise Academic edition and RevoDeployR are supported on Red Hat® Enterprise Linux® 5.4 or greater (64-bit processors).
Approximately 300MB free disk space is required for a full install of Revolution R Enterprise. We recommend at least 1GB of RAM to use Revolution R Enterprise.
For the full list of system requirements for RevoDeployR, refer to the RevoDeployR™ Installation Guide for Red Hat® Enterprise Linux®.
You will first need to download the Revolution R Enterprise installer.
Installation Instructions for Revolution R Enterprise Academic Edition
After downloading the installer, do the following to install the software:
Unpack the installer using the following command:
tar -xzf Revo-Ent-4.0.1-RHEL5-desktop.tar.gz
Change directory to the RevolutionR_4.0.1 directory created.
Run the installer by typing ./install.py and following the on-screen prompts.
Getting Started with the Revolution R Enterprise
After you have installed the software, launch Revolution R Enterprise by typing Revo64 at the shell prompt.
Documentation is available in the form of PDF documents installed as part of the Revolution R Enterprise distribution. Type Revo.home(“doc”) at the R prompt to locate the directory containing the manuals Getting Started with Revolution R (RevoMan.pdf) and the ParallelR User’s Guide(parRman.pdf).
Installation Instructions for RevoDeployR (and RServe)
After downloading the RevoDeployR distribution, use the following steps to install the software:
Note: These instructions are for an automatic install. For more details or for manual install instructions, refer to RevoDeployR_Installation_Instructions_for_RedHat.pdf.
Log into the operating system as root.
Change directory to the directory containing the downloaded distribution for RevoDeployR and RServe.
Unzip the contents of the RevoDeployR tar file. At prompt, type:
tar -xzf deployrRedHat.tar.gz
Change directories. At the prompt, type:
Launch the automated installation script and follow the on-screen prompts. At the prompt, type:
./installRedHat.sh Note:Red Hat installs MySQL without a password.
Getting Started with RevoDeployR
After installing RevoDeployR, you will be directed to the RevoDeployR landing page. The landing page has links to documentation, the RevoDeployR management console, the API Explorer development tool, and sample code.
The simple R-benchmark-25.R test script is a quick-running survey of general R performance. The Community-developed test consists of three sets of small benchmarks, referred to in the script as Matrix Calculation, Matrix Functions, and Program Control.
Revolution Analytics has created its own tests to simulate common real-world computations. Their descriptions are explained below.
Linear Algebra Computation
Base R 2.9.2
Revolution R (1-core)
Revolution R (4-core)
Speedup (4 core)
Singular Value Decomposition
Principal Components Analysis
Linear Discriminant Analysis
Speedup = Slower time / Faster Time – 1
This routine creates a random uniform 10,000 x 5,000 matrix A, and then times the computation of the matrix product transpose(A) * A.
m <- 10000
n <- 5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))
The system will respond with a message in this format:
User system elapsed
37.22 0.40 9.68
The “elapsed” times indicate total wall-clock time to run the timed code.
The table above reflects the elapsed time for this and the other benchmark tests. The test system was an INTEL® Xeon® 8-core CPU (model X55600) at 2.5 GHz with 18 GB system RAM running Windows Server 2008 operating system. For the Revolution R benchmarks, the computations were limited to 1 core and 4 cores by calling setMKLthreads(1) and setMKLthreads(4) respectively. Note that Revolution R performs very well even in single-threaded tests: this is a result of the optimized algorithms in the Intel MKL library linked to Revolution R. The slight greater than linear speedup may be due to the greater total cache available to all CPU cores, or simply better OS CPU scheduling–no attempt was made to pin execution threads to physical cores. Consult Revolution R’s documentation to learn how to run benchmarks that use less cores than your hardware offers.
The Cholesky matrix factorization may be used to compute the solution of linear systems of equations with a symmetric positive definite coefficient matrix, to compute correlated sets of pseudo-random numbers, and other tasks. We re-use the matrix B computed in the example above:
system.time (C <- chol(B))
Singular Value Decomposition with Applications
The Singular Value Decomposition (SVD) is a numerically-stable and very useful matrix decompisition. The SVD is often used to compute Principal Components and Linear Discriminant Analysis.
# Singular Value Deomposition
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (S <- svd (A,nu=0,nv=0))
# Principal Components Analysis
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (P <- prcomp(A))
# Linear Discriminant Analysis require (‘MASS’)
g <- 5
k <- round (m/2)
A <- data.frame (A, fac=sample (LETTERS[1:g],m,replace=TRUE))
train <- sample(1:m, k)
system.time (L <- lda(fac ~., data=A, prior=rep(1,g)/g, subset=train))
Just go to Users-Personal Settings- and check the options shown. Thats it every time you write a post it suggests links and tags. Links are helpful for your readers (like Wikipedia links to understand dense technical jargon, or associated websites). Tags help to classify your contents so that all visitors to the web site including spiders ,search engines and your readers can search it better.
The bad thing is I need to go back to all 1025 posts on this site and auto generate tags for the archives ! Oh well. Great collaboration between zementa and Automattic for this new feature.
October’s R meet-up will be co-located with the Predictive Analytics World Conference (http://www.predictive…) taking place in Washington DC October 19-20. PAW is the premiere business-focused event for predictive analytics professionals, managers and commercial practitioners.
Important Registration Instructions:
You are welcome to RSVP here at meetup. The PAW organizers have requested that we register in the PAW site for the R meetup so they can provide badges to members which will give you access to the reception. There is no charge to register using the PAW site. Please click here to register.
Harlan D. Harris, PhD, is a statistical data scientist working for Kaplan Test Prep and Admissions in New York City. He has degrees from the University of Wisconsin-Madison and the University of Illinois at Urbana-Champaign. Prior to turning to the private sector, he worked as a researcher and lecturer in various areas of Artificial Intelligence and Cognitive Science at the University of Illinois, Columbia University, the University of Connecticut, and New York University.
Harlan’s talk is titled “How to speak ggplot2 like a native.”. One of the most innovative ideas in data visualization in recent years is that graphical images can be described using a grammar. Just as a fluent speaker of a language can talk more precisely and clearly than someone using a tourist phrasebook, graphics based on a grammar can yield more insights than graphics based on a limited set of templates (bar chart, pie graph, etc.). There are at least two implementations of the Grammar of Graphics idea in R, of which the most popular is the ggplot2 package written by Prof. Hadley Wickham. Just as with natural languages, ggplot2 has a surface structure made up of R vocabulary elements, as well as a deep structure that mediates the link between the vocabulary and the “semantic” representation of the data shown on a computer screen. In this introductory presentation, the links among these levels of representation are demonstrated, so that new ggplot2 users can build the mental models necessary for fluent and creative visualization of their data.
Michael Milton is a Client Manager at Blue State Digital. When he’s not saving the world by designing interactive marketing strategies that connect passionate users with causes and organizations, he writes about data and analytics. For O’Reilly Media, he wrote Head First Data Analysis and Head First Excel and has created the videos Great R: Level 1 and Getting the Most Out of Google Apps for Business.
Michael’s talk is called “How to Save the World Using R.” In this wide-ranging discussion, Michael will highlight individuals and organizations who are using R to help others as well as ways in which R can be used to promote good statistical thinking.
Early Bird Special
Register for M2010 before Sept. 17 and save $200 on conference fees!
Additional Data Mining Resources
Find additional data mining resouces including links to whitepapers, webinars, audio seminars, videos, blogs and online communities.
Las Vegas, NV
Conference: October 25-26
Pre-conference workshops: October 24
Post-conference training: October 27-29
The M2010 Data Mining Conference is an international educational conference and exhibition for data mining practitioners including analysts, statisticians, programmers, consultants and anyone involved with data management within their organization, Hosted by SAS, M2010 is now in its 13th year and has become the world’s largest data mining conference, attracting over 600 people from various industries including Financial Services, Retail, Insurance, Technology, Education, Healthcare, Pharmaceutical, Government and more.
This conference is the top-choice for serious education and career networking. Conference highlights include
PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.
The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.
PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.
A brief list of some of the features of PSPP follows:
Supports over 1 billion cases.
Supports over 1 billion variables.
Syntax and data files are compatible with SPSS.
Choice of terminal or graphical user interface.
Choice of text, postscript or html output formats.
Cross platform; Runs on many different computers and many different operating systems.
PSPP is particularly aimed at statisticians, social scientists and students requiring fast convenient analysis of sampled data.
This software provides a basic set of capabilities: frequencies, cross-tabs comparison of means (T-tests and one-way ANOVA); linear regression, reliability (Cronbach’s Alpha, not failure or Weibull), and re-ordering data, non-parametric tests, factor analysis and more.
The PSPP project (originally called “Fiasco”) is a free, open-source alternative to the proprietary statistics package SPSS. SPSS is closed-source and includes a restrictive licence anddigital rights management. The author of PSPP considered this ethically unacceptable, and decided to write a program which might with time become functionally identical to SPSS, except that there would be no licence expiry, and everyone would be permitted to copy, modify and share the program.
In the book “SPSS For Dummies“, the author discusses PSPP under the heading of “Ten Useful Things You Can Find on the Internet” . In 2006, the South African Statistical Association presented a conference which included an analysis of how PSPP can be used as a free replacement to SPSS .
Please send FSF & GNU inquiries to email@example.com. There are also other ways to contact the FSF. Please send broken links and other corrections (or suggestions) to firstname.lastname@example.org.