Data Mining to change the world of health care

http://www.heritagehealthprize.com/c/hhp

Heritage Health Prize, a $3 million competition to predict who will go to hospital and for how long.

So as not to overwhelm anyone, we will be releasing the data in three waves. Today’s launch allows people to register and download the first instalment, which includes enough data for people to start trying out models. It includes claims data from Y1, information on members and the details of hospitalizations recorded in Y2.

The next instalment will be released on May 4 and will involve the release a more comprehensive dataset, including claims for later years as well as the test dataset against which entries will be judged. It is at this point that we will open up the competition to entries, reveal the performance threshold and begin posting the leaderboard.

Finally, the last release happens on June 4 and will include some ancillary data of prescriptions and lab tests.

Kaggle members don’t sign up again. To register, simply login and accept the rules before downloading the data.

Finally the Twitter hashtag for the competition is #drflix. Help spread the word.

http://www.heritagehealthprize.com/c/hhp

Heritage Provider Network Announces the Heritage Health Prize Will Include $230,000 in Progress Prizes (prnewswire.com)
$3.2M in prizes for predicting hospitalization (revolutionanalytics.com)
Heritage Health Prize: Is $3 million enough to improve the U.S. health care system? (slate.com)

PSPP – SPSS 's Open Source Counterpart

New Website for Windows Installers for PSPP– try at your own time if you are dedicated to either SPSS or free statistical computing.

http://pspp.awardspace.com/

This page is intended to give a stable root for downloading the PSPP-for-Windows setup from free mirrors.

Highlights of the current PSPP-for-Windows setup

PSPP info:

Current version:	Master version = 0.7.6
Release date:	See filenames
Information about PSPP:	http://www.gnu.org/software/pspp
PSPP Manual:	PDF or HTML (current version will be installed on your PC by the installer package)

Package info:

Windows version:	Windows XP and newer
Package Size:	15 Mb
Size on disk:	34 Mb
Technical:	MinGW based Cross-compiled on openSUSE 11.3

Downloads:
There are issues with the latest build. Some users report crashes on their systems on other systems it works fine.

Version	Installer for multi-user installation. Administrator privileges required. Recommended version.	Installer for single-user installation. No administrator privileges required
0.7.6-g38ba1e-blp-build20101116 0.7.5-g805e7e-blp-build20100908 0.7.5-g7803d3-blp-build20100820 0.7.5-g333ac4-blp-build20100727	PSPP-Master-2010-11-16 PSPP-Master-2010-09-08 PSPP-Master-2010-08-20 PSPP-Master-2010-07-27	PSPP-Master-single-user-2010-11-16 PSPP-Master-single-user-2010-09-08 PSPP-Master-single-user-2010-08-20 PSPP-Master-single-user-2010-07-27

Sources can be found here.

Also see http://en.wikipedia.org/wiki/PSPP

At the user’s choice, statistical output and graphics are done in ASCII, PDF, PostScript or HTML formats. A limited range of statistical graphs can be produced, such as histograms, pie-charts and np-charts.

PSPP can import Gnumeric, OpenDocument and Excel spreadsheets, Postgres databases, comma-separated values– and ASCII-files. It can export files in the SPSS ‘portable’ and ‘system’ file formats and to ASCII files. Some of the libraries used by PSPP can be accessed programmatically; PSPP-Perl provides an interface to the libraries used by PSPP.

and

http://www.gnu.org/software/pspp/

A brief list of some of the features of PSPP follows:

Supports over 1 billion cases.
Supports over 1 billion variables.
Syntax and data files are compatible with SPSS.
Choice of terminal or graphical user interface.
Choice of text, postscript or html output formats.
Inter-operates with Gnumeric, OpenOffice.Org and other free software.
Easy data import from spreadsheets, text files and database sources.
Fast statistical procedures, even on very large data sets.
No license fees.
No expiration period.
No unethical “end user license agreements”.
Fully indexed user manual.
Free Software; licensed under GPLv3 or later.
Cross platform; Runs on many different computers and many different operating systems.

R ready to Deduce you (ekonometrics.blogspot.com)
SPSS Guru embraces the freeware, R (ekonometrics.blogspot.com)
Create a Web Crawler in R (r-bloggers.com)

Comparing Bit Torrent Downloaders

Tux, as originally drawn by Larry Ewing — Image via Wikipedia

I personally like UTorrent on Windows and KTorrent on Linux.

While no experts on this, anything that gets the data down faster while maximizing my pipes efficiency.

I also like Torrenting than any of the sudo-apt get method of downloading software or the zip unzip,tar untar, install/make file

Torrenting is a simpler way of sharing applications but sadly not used much by the stats computing community to share downloads.

Also I think any dashboard or visualization should be sorted (but not alphabetically but numerically/categorically)

SORT THE DASHBOARD —-KEEP IT SORTED

So I am partially recreating after sorting the data viz from http://en.wikipedia.org/wiki/Comparison_of_BitTorrent_clients

BitTorrent client	Magnet URI	Super-seeding	Embedded tracker	UPnP [81]	NAT Port Mapping Protocol	NAT traversal [82]	DHT [83]	Peer exchange	Encryption	UDP tracker	LPD
µTorrent	Yes	Yes[95]	Yes[96]	Yes[97]	Yes	Yes[98]	Yes[99]	Yes[85]	Yes[100]	Yes	Yes[101]
BitSpirit [11]	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
BitTorrent 6	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes[85]	Yes	Yes	Yes
OneSwarm	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
qBittorrent	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
SoMud	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Vuze (formerly Azureus)	Yes	Yes	Yes	Yes	Yes	Yes[102]	Yes[87]	Yes	Yes	Yes	No
BitComet	Yes	Yes	Separate download	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No
Tixati [43]	Yes	Yes	No	Yes	No	No	Yes	Yes	Yes	Yes	Partial
Aria2	Yes	No	Yes	No	No	No	Yes	Yes	Yes	Yes	Yes
Tribler	Yes	No	Yes	Yes	Yes	No	Yes	Yes	Yes	No	No
Bitflu	Yes	No	No	No	No	No	Yes	Yes	No	Yes	No
Deluge	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Flush	Yes	No	No	Yes	Yes	No	Yes	Yes	No	No	Yes
KTorrent	Yes	No	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Partial
Shareaza	Yes	No	No	Yes	Yes	No	Yes[93]	Yes	No	No	No
Transmission	Yes	No	No	Yes	Yes	Yes	Yes	Yes[94]	Yes	No	Yes
LimeWire	Partial	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes	Yes	No
BitTyrant	No	Yes[citation needed]	Yes	Yes	Yes	Yes[86]	Yes[87]	Yes	Yes	No	No
BitTornado	No	Yes	Yes[84]	Yes	No	No	No	No	Yes	No	No
Torrent Swapper	No	Yes	Yes[84]	Yes	No	No	No	Yes	No	No	No
Localhost	No	Yes	Yes	Yes	No	Yes	Yes [89]	No	No	No	No
Meerkat Bittorrent Client	No	Yes	No	Yes	Yes	Yes	Yes	No	Yes	No	No
rTorrent	No	Yes	No	No	No	No	Yes	Yes	Yes	Yes	No[92]
TorrentFlux	No	Yes	No	Yes	No	No	No	No	Yes	No	No
TorrentVolve	No	Partial [76]	No	Partial[76]	Partial [76]	Partial [76]	Partial[76]	Partial [76]	Partial [76]	Partial [76]	No
Opera	No	No	Yes[90]	No	No	No	No	Yes[91]	No	No	No
BitTorrent 5 / Mainline	No	No	Yes[84]	Yes	Yes	No	Yes	Yes	Yes	No	No
ABC	No	No	Yes	Yes	No	No	No	No	No	No	No
Blog Torrent	No	No	Yes	No	No	No	No	No	No	No	No
MLDonkey	No	No	Yes	Yes	Yes	No	No	No	No	Yes	No
Tomato Torrent	No	No	Yes	No	No	No	Yes	No	No	No	No
Acquisition	No	No	No	No	Yes	No	No	No	No	No	No
Arctic Torrent	No	No	No	No	No	No	No	Yes	No	No	No
BitLet	No	No	No	Yes	No	No	No	No	No	No	No
BitLord	No	No	No	Yes	No	Yes	No	Yes	No	Yes	No
BitThief	No	No	No	No	No	No	No	No	No	No	No
Bits on Wheels	No	No	No	No	No	No	No	No	No	No	No
BTG	No	No	No	Yes	Yes	No	Yes	Yes	Yes	Yes	No
BTPD	No	No	No	No	No	No	No	No	No	No	No
FlashGet	No	No	No	No	No	No	Yes	No	Yes	No	No
Folx	No	No	No	Yes	Yes	No	Yes	Yes	No	Yes	No
Free Download Manager	No	No	No	No	No	No	Yes	Yes	No	No	No
G3 Torrent	No	No	No	No	No	No	No	No	No	No	No
Gnome BitTorrent	No	No	No	No	No	No	No	No	No	No	No
Halite	No	No	No	Yes	Yes	No	Yes	No	Yes	No[88]	No
QTorrent	No	No	No	No	No	No	No	No	No	No	No
Rufus	No	No	No	No	No	No	No	No	No	No	No
SymTorrent	No	No	No	N/A	N/A	N/A	No	No	No	No	No
Tonido Torrent	No	No	No	Yes	Yes	Yes	Yes	No	No	No	No
Torium	No	No	No	Yes	No	No	Yes	No	No	No	No
ZipTorrent	No	No	No	Yes	Yes	No	No	Yes	No	No	No

uTorrent Falcon Remote Controls Your BitTorrent Downloads from Any Browser [Downloads] (lifehacker.com)
Transmission 2.0 Adds a Whole Lot of Stability to the Popular BitTorrent Client [Downloads] (lifehacker.com)
Put uTorrent On Steroids By Installing Extensions On It [Windows] (makeuseof.com)
uTorrent Outpaces Vuze in BitTorrent Download Speed by 16% [File Sharing] (lifehacker.com)
uTorrent Adds Great iPhone (and Android) Remote Torrent Control Interface [Utorrent] (lifehacker.com)
Dropbox + uTorrent “Watched Folders” FTW (benjaminste.in)
BitTorrent’s Mainline and uTorrent clients reach 100 million active monthly users (downloadsquad.switched.com)
5 Best μTorrent Apps (maketecheasier.com)
Top 10 Cross-Platform BitTorrent Clients (tesarn.blogspot.com)
The 5 Best Torrent Clients For Linux (makeuseof.com)
You: Tribler BitTorrent Client Searches and Downloads Files, No Unreliable Tracker Required [Downloads] (lifehacker.com)
The Next Big DDOS Attack May Come via BitTorrent (gigaom.com)
BitTorrent Inc. To Launch All-In-One BitTorrent Ecosystem (torrentfreak.com)
Bittorrent Inc Launching All In One Application: Vuze Competitor (crenk.com)
BitTorrent Client Offers P2P Without Central Tracking (tech.slashdot.org)
How to Share Your Own Files Using BitTorrent [UltraNewb] (lifehacker.com)
Install apps on uTorrent with App Studio (madrasgeek.com)
Vuze 4.6 adds uTP support, speeds up torrent downloads (downloadsquad.switched.com)

Cloud Computing with R

Illusion of Depth and Space (4/22) - Rotating ... — Image by Dominic's pics via Flickr

Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

As per the National Institute of Standards and Technology Definition,
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

(Citation: The NIST Definition of Cloud Computing

Authors: Peter Mell and Tim Grance
Version 15, 10-7-09
National Institute of Standards and Technology, Information Technology Laboratory
http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

R Web Interfaces

Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

Also additional R Cloud Computing Use Cases
http://wwwdev.ebi.ac.uk/Tools/rcloud/

ArrayExpress R/Bioconductor Workbench

Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

Most up-to-date R version built for multicore CPUs
Access to all Bioconductor packages
Access to our computing infrastructure
Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

Using R Google Docs
http://www.omegahat.org/RGoogleDocs/run.pdf
It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
to use their primitives to interact with Web services.

Using R with Amazon
Citation
http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Connecting to R on Amazon EC2- Detailed tutorials
Ubuntu Linux version
https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
and Windows R version
https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

Connecting R to Data on Google Storage and Computing on Google Prediction API
https://github.com/onertipaday/predictionapirwrapper
R wrapper for working with Google Prediction API

This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
It requires the user to have access to both Google Storage for Developers and Google Prediction API:
see http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

Example usage:

#This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
library(predictionapirwrapper)

and Elastic R for Cloud Computing
http://user2010.org/tutorials/Chine.html

Abstract

Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

R for Salesforce.com

At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

Force.com and Salesforce have many (1009) apps at
http://sites.force.com/appexchange/home for cloud computing for
businesses, but very few forecasting and statistical simulation apps.

Example of Monte Carlo based app is here
http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

These are like iPhone apps except meant for business purposes (I am
unaware if any university is offering salesforce.com integration
though google apps and amazon related research seems to be on)

Force.com uses a language called Apex and you can see
http://wiki.developerforce.com/index.php/App_Logic and
http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
Apex is similar to R in that is OOPs

SAS Institute has an existing product for taking in Salesforce.com data.

A new SAS data surveyor is
available to access data from the Customer Relationship Management
(CRM) software vendor Salesforce.com. at
http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

For python based cloud see http://pi-cloud.com

Downloading your Facebook Photos

Here is a three step process for Downloading your entire Facebook Info-

1) Go to top right hand corner- Account.

2) Under Account- click on Account Settings

3)Under Account Settings -Click on Download Information

Facebook creates a ZIP file for downloading your information- and sends you an email when the info is ready.

Why should you download your Facebook Info-

1) As a local backup of your profile

2) To move to Orkut – or better to share Photos on Picassa or Flickr to your non Facebook friends. Note FB does have a unique URL under each Photo that you can copy and paste and share, but its not really convenient besides being huge and not a small URL at all.

3) In case you want to delete your Facebook account but dont want to lose your memories and friends.

Nice step from FB- they are taking user privacy and empowerment more seriously and at 500 Million users can afford to be a bit more generous.

Facebook Adds Download Your Information Feature (ghacks.net)
How to download Personal Facebook Profile Information To Computer (mydigitallife.info)
Download the Missing Pieces of Data from Facebook (labnol.org)

Revolution R for Linux

New software just released from the guys in California (@RevolutionR) so if you are a Linux user and have academic credentials you can download it for free (@Cmastication doesnt), you can test it to see what the big fuss is all about (also see http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php) –

Revolution Analytics has just released Revolution R Enterprise 4.0.1 for Red Hat Enterprise Linux, a significant step forward in enterprise data analytics. Revolution R Enterprise 4.0.1 is built on R 2.11.1, the latest release of the open-source environment for data analysis and graphics. Also available is the initial release of our deployment server solution, RevoDeployR 1.0, designed to help you deliver R analytics via the Web. And coming soon to Linux: RevoScaleR, a new package for fast and efficient multi-core processing of large data sets.

As a registered user of the Academic version of Revolution R Enterprise for Linux, you can take advantage of these improvements by downloading and installing Revolution R Enterprise 4.0.1 today. You can install Revolution R Enterprise 4.0.1 side-by-side with your existing Revolution R Enterprise installations; there is no need to uninstall previous versions.

Download Information

The following information is all you will need to download and install the Academic Edition.

Supported Platforms:

Revolution R Enterprise Academic edition and RevoDeployR are supported on Red Hat® Enterprise Linux® 5.4 or greater (64-bit processors).

Approximately 300MB free disk space is required for a full install of Revolution R Enterprise. We recommend at least 1GB of RAM to use Revolution R Enterprise.

For the full list of system requirements for RevoDeployR, refer to the RevoDeployR™ Installation Guide for Red Hat® Enterprise Linux®.

Download Links:

You will first need to download the Revolution R Enterprise installer.

Installation Instructions for Revolution R Enterprise Academic Edition

After downloading the installer, do the following to install the software:

Log in as root if you have not already.

Change directory to the directory containing the downloaded installer.

Unpack the installer using the following command:
tar -xzf Revo-Ent-4.0.1-RHEL5-desktop.tar.gz

Change directory to the RevolutionR_4.0.1 directory created.

Run the installer by typing ./install.py and following the on-screen prompts.

Getting Started with the Revolution R Enterprise

After you have installed the software, launch Revolution R Enterprise by typing Revo64 at the shell prompt.

Documentation is available in the form of PDF documents installed as part of the Revolution R Enterprise distribution. Type Revo.home(“doc”) at the R prompt to locate the directory containing the manuals Getting Started with Revolution R (RevoMan.pdf) and the ParallelR User’s Guide(parRman.pdf).

Installation Instructions for RevoDeployR (and RServe)

After downloading the RevoDeployR distribution, use the following steps to install the software:

Note: These instructions are for an automatic install. For more details or for manual install instructions, refer to RevoDeployR_Installation_Instructions_for_RedHat.pdf.

Log into the operating system as root.
su –

Change directory to the directory containing the downloaded distribution for RevoDeployR and RServe.

Unzip the contents of the RevoDeployR tar file. At prompt, type:
tar -xzf deployrRedHat.tar.gz

Change directories. At the prompt, type:
cd installFiles

Launch the automated installation script and follow the on-screen prompts. At the prompt, type:
./installRedHat.sh
Note: Red Hat installs MySQL without a password.

Getting Started with RevoDeployR

After installing RevoDeployR, you will be directed to the RevoDeployR landing page. The landing page has links to documentation, the RevoDeployR management console, the API Explorer development tool, and sample code.

Support

For help installing this Academic Edition, please email support@revolutionanalytics.com

Also interestingly some benchmarks on Revolution R vs R.

http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

R-25 Benchmarks

The simple R-benchmark-25.R test script is a quick-running survey of general R performance. The Community-developed test consists of three sets of small benchmarks, referred to in the script as Matrix Calculation, Matrix Functions, and Program Control.

R-25 Benchmarks	Base R 2.9.2	Revolution R (1-core)	Revolution R (4-core)	Speedup (4 core)
Matrix Calculation	34 sec	6.6 sec	4.4 sec	7.7x
Matrix Functions	20 sec	4.4 sec	2.1 sec	9.5x
Program Control	4.7 sec	4 sec	4.2 sec	Not Appreciable

Speedup = Slower time / Faster Time – 1 Test descriptions available at http://r.research.att.com/benchmarks

Additional Benchmarks

Revolution Analytics has created its own tests to simulate common real-world computations. Their descriptions are explained below.

Linear Algebra Computation	Base R 2.9.2	Revolution R (1-core)	Revolution R (4-core)	Speedup (4 core)
Matrix Multiply	243 sec	22 sec	5.9 sec	41x
Cholesky Factorization	23 sec	3.8 sec	1.1 sec	21x
Singular Value Decomposition	62 sec	13 sec	4.9 sec	12.6x
Principal Components Analysis	237 sec	41 sec	15.6 sec	15.2x
Linear Discriminant Analysis	142 sec	49 sec	32.0 sec	4.4x

Speedup = Slower time / Faster Time – 1

Matrix Multiply

This routine creates a random uniform 10,000 x 5,000 matrix A, and then times the computation of the matrix product transpose(A) * A.

set.seed (1)
m <- 10000
n <- 5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))

The system will respond with a message in this format:

User system elapsed
37.22 0.40 9.68

The “elapsed” times indicate total wall-clock time to run the timed code.

The table above reflects the elapsed time for this and the other benchmark tests. The test system was an INTEL® Xeon® 8-core CPU (model X55600) at 2.5 GHz with 18 GB system RAM running Windows Server 2008 operating system. For the Revolution R benchmarks, the computations were limited to 1 core and 4 cores by calling setMKLthreads(1) and setMKLthreads(4) respectively. Note that Revolution R performs very well even in single-threaded tests: this is a result of the optimized algorithms in the Intel MKL library linked to Revolution R. The slight greater than linear speedup may be due to the greater total cache available to all CPU cores, or simply better OS CPU scheduling–no attempt was made to pin execution threads to physical cores. Consult Revolution R’s documentation to learn how to run benchmarks that use less cores than your hardware offers.

Cholesky Factorization

The Cholesky matrix factorization may be used to compute the solution of linear systems of equations with a symmetric positive definite coefficient matrix, to compute correlated sets of pseudo-random numbers, and other tasks. We re-use the matrix B computed in the example above:

system.time (C <- chol(B))

Singular Value Decomposition with Applications

The Singular Value Decomposition (SVD) is a numerically-stable and very useful matrix decompisition. The SVD is often used to compute Principal Components and Linear Discriminant Analysis.

# Singular Value Deomposition
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (S <- svd (A,nu=0,nv=0))

# Principal Components Analysis
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (P <- prcomp(A))

# Linear Discriminant Analysis
require (‘MASS’)
g <- 5
k <- round (m/2)
A <- data.frame (A, fac=sample (LETTERS[1:g],m,replace=TRUE))
train <- sample(1:m, k)
system.time (L <- lda(fac ~., data=A, prior=rep(1,g)/g, subset=train))

Revolution Analytics Introduces Enterprise-Class Application Integration, Deployment & Administration for R (eon.businesswire.com)
R on Windows HPC Server (r-bloggers.com)
Revolution links R stats package to apps (go.theregister.com)
Introducing RevoDeployR: Web Services for R (r-bloggers.com)

Interfaces to R

This is a fairly long post and is a basic collection of material for a book/paper. It is on interfaces to use R. If you feel I need to add more on a particular R interface, or if there is an error in this- please feel to contact me on twitter @decisionstats or mail ohri2007 on google mail.

● R Interfaces

There are multiple ways to use the R statistical language.

Command Line- The default method is using the command prompt by the installed software on download from http://r-project.org
For windows users there is a simple GUI which has an option for Packages (loading package, installing package, setting CRAN mirror for downloading packages) , Misc (useful for listing all objects loaded in workspace as well as clearing objects to free up memory), and Help Menu.

Using Click and Point- Besides the command prompt, there are many Graphical User Interfaces which enable the analyst to use click and point methods to analyze data without getting into the details of learning complex and at times overwhelming R syntax. R GUIs are very popular both as mode of instruction in academia as well as in actual usage as it cuts down considerably on time taken to adapt to the language. As with all command line and GUI software, for advanced tweaks and techniques, command prompt will come in handy as well.

Advantages and Limitations of using Visual Programming Interfaces to R as compared to Command Line.

Advantages	Limitations
Faster learning for new programmers	Can create junk analysis by clicking menus in GUI
Easier creation of advanced models or graphics	Cannot create custom functions unless you use command line
Repeatability of analysis is better	Advanced techniques and custom flexibility of data handling R can be done in command line
Syntax is auto-generated	Can limit scope and exposure in learning R syntax

A brief list of the notable Graphical User Interfaces is below-

1) R Commander- Basic statistics
2) Rattle- Data Mining
3) Deducer- Graphics (including GGPlot Integration) and also uses JGR (a Jave based GUI)
4) RKward- Comprehensive R GUI for customizable graphs
5) Red-R – Dataflow programming interface using widgets

1) R Commander- R Commander was primarily created by Professor John Fox of McMaster University to cover the content of a basic statistics course. However it is extensible and many other packages can be added in menu form to it- in the form R Commander Plugins. Quite noticeably it is one of the most widely used R GUI and it also has a script window so you can write R code in combination with the menus.
As you point and click a particular menu item, the corresponding R code is automatically generated in the log window and executed.

It can be found on CRAN at http://cran.r-project.org/web/packages/Rcmdr/index.html

Advantages of Using R Commander-
1) Useful for beginner in R language to do basic graphs and analysis and building models.
2) Has script window, output window and log window (called messages) in same screen which helps user as code is auto-generated on clicking on menus, and can be customized easily. For example in changing labels and options in Graphs. Graphical output is shown in seperate window from output window.
3) Extensible for other R packages like qcc (for quality control), Teaching Demos (for training), survival analysis and Design of Experiments (DoE)
4) Easy to understand interface even for first time user.
5) Menu items which are not relevant are automatically greyed out- if there are only two variables, and you try to build a 3D scatterplot graph, that menu would simply not be available and is greyed out.

Comparative Disadvantages of using R Commander-
1) It is basically aimed at a statistical audience( originally students in statistics) and thus the terms as well as menus are accordingly labeled. Hence it is more of a statistical GUI rather than an analytics GUI.
2) Has limited ability to evaluate models from a business analysts perspective (ROC curve is not given as an option) even though it has extensive statistical tests for model evaluation in model sub menu. Indeed creating a Model is treated as a subsection of statistics rather than a separate menu item.
3) It is not suited for projects that do not involve advanced statistical testing and for users not proficient in statistics (particularly hypothesis testing), and for data miners.

Menu items in the R Commander window:
File Menu – For loading script files and saving Script files, Output and Workspace
It is also needed for changing the present working directory and for exiting R.
Edit Menu – For editing scripts and code in the script window.
Data Menu – For creating new dataset, inputting or importing data and manipulating data through variables. Data Import can be from text,comma separated values,clipboard, datasets from SPSS, Stata,Minitab, Excel ,dbase, Access files or from url.
Data manipulation included deleting rows of data as well as manipulating variables.
Also this menu has the option for merging two datasets by row or columns.
Statistics Menu-This menu has options for descriptive statistics, hypothesis tests, factor analysis and clustering and also for creating models. Note there is a separate menu for evaluating the model so created.
Graphs Menu-It has options for creating various kinds of graphs including box-plot, histogram, line, pie charts and x-y plots.
The first option is color palette- it can be used for customizing the colors. It is recommended you adjust colors based on your need for publication or presentation.
A notable option is 3 D graphs for evaluating 3 variables at a time- this is really good and impressive feature and exposes the user to advanced graphs in R all at few clicks. You may want to dazzle a presentation using this graph.
Also consider scatterplot matrix graphs for graphical display of variables.
Graphical display of R surpasses any other statistical software in appeal as well as ease of creation- using GUI to create graphs can further help the user to get the most of data insights using R at a very minimum effort.
Models Menu-This is somewhat of a labeling peculiarity of R Commander as this menu is only for evaluating models which have been created using the statistics menu-model sub menu.
It includes options for graphical interpretation of model results,residuals,leverage and confidence intervals and adding back residuals to the data set.
Distributions Menu- is for cumulative probabilities, probability density, graphs of distributions, quantiles and features for standard distributions and can be used in lieu of standard statistical tables for the distributions. It has 13 standard statistical continuous distributions and 5 discrete distributions.
Tools Menu- allows you to load other packages and also load R Commander plugins (which are then added to the Interface Menu after the R Commander GUI is restarted). It also contains options sub menu for fine tuning (like opting to send output to R Menu)
Help Menu- Standard documentation and help menu. Essential reading is the short 25 page manual in it called Getting “Started With the R Commander”.

R Commander Plugins- There are twenty extensions to R Commander that greatly enhance it’s appeal -these include basic time series forecasting, survival analysis, qcc and more.

see a complete list at

DoE – http://cran.r-project.org/web/packages/RcmdrPlugin.DoE/RcmdrPlugin.DoE.pdf

doex

EHESampling

epack- http://cran.r-project.org/web/packages/RcmdrPlugin.epack/RcmdrPlugin.epack.pdf

Export- http://cran.r-project.org/web/packages/RcmdrPlugin.Export/RcmdrPlugin.Export.pdf

FactoMineR

HH

IPSUR

MAc- http://cran.r-project.org/web/packages/RcmdrPlugin.MAc/RcmdrPlugin.MAc.pdf

MAd

orloca

PT

qcc- http://cran.r-project.org/web/packages/RcmdrPlugin.qcc/RcmdrPlugin.qcc.pdf and http://cran.r-project.org/web/packages/qcc/qcc.pdf

qual

SensoMineR

SLC

sos

survival-http://cran.r-project.org/web/packages/RcmdrPlugin.survival/RcmdrPlugin.survival.pdf

SurvivalT

Teaching Demos

Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.

However the notable E Plugins are
1) DoE for Design of Experiments-
Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
3) qcc -GUI for Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
and following Models -Arima, garch, HoltWinters
5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
output to formats other than ASCII text and to provide R novices with an easy-to-use,
easy-to-access reference on exporting R objects to formats suited for printed output. The
package documentation contains several pointers on creating reports, either by using
conventional word processors or LaTeX/LyX.
6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
R and the MAc package. The MAc package itself contains a variety of useful functions for
conducting a research synthesis with correlational data. One of the unique features of the MAc
package is in its integration of user-friendly functions to complete the majority of statistical steps
involved in a meta-analysis with correlations.
You can read more on R Commander Plugins at http://wp.me/p9q8Y-1Is
—————————————————————————————————————————-
Rattle- R Analytical Tool To Learn Easily (download from http://rattle.togaware.com/)
Rattle is more advanced user Interface than R Commander though not as popular in academia. It has been designed explicitly for data mining and it also has a commercial version for sale by Togaware. Rattle has a Tab and radio button/check box rather than Menu- drop down approach towards the graphical design. Also the Execute button needs to be clicked after checking certain options, just the same as submit button is clicked after writing code. This is different from clicking on a drop down menu.

Advantages of Using Rattle
1) Useful for beginner in R language to do building models,cluster and data mining.
2) Has separate tabs for data entry,summary, visualization,model building,clustering, association and evaluation. The design is intuitive and easy to understand even for non statistical background as the help is conveniently explained as each tab, button is clicked. Also the tabs are placed in a very sequential and logical order.
3) Uses a lot of other R packages to build a complete analytical platform. Very good for correlation graph,clustering as well decision trees.
4) Easy to understand interface even for first time user.
5) Log for R code is auto generated and time stamp is placed.
6) Complete solution for model building from partitioning datasets randomly for testing,validation to building model, evaluating lift and ROC curve, and exporting PMML output of model for scoring.
7) Has a well documented online help as well as in-software documentation. The help helps explain terms even to non statistical users and is highly useful for business users.

Example Documentation for Hypothesis Testing in Test Tab in Rattle is ”
Distribution of the Data
* Kolomogorov-Smirnov     Non-parametric Are the distributions the same?
* Wilcoxon Signed Rank    Non-parametric Do paired samples have the same distribution?
Location of the Average
* T-test               Parametric     Are the means the same?
* Wilcoxon Rank-Sum    Non-parametric Are the medians the same?
Variation in the Data
* F-test Parametric Are the variances the same?
Correlation
* Correlation    Pearsons Are the values from the paired samples correlated?”

Comparative Disadvantages of using Rattle-
1) It is basically aimed at a data miner. Hence it is more of a data mining GUI rather than an analytics GUI.
2) Has limited ability to create different types of graphs from a business analysts perspective Numeric variables can be made into Box-Plot, Histogram, Cumulative as well Benford Graphs. While interactivity using GGobi and Lattiticist is involved- the number of graphical options is still lesser than other GUI.
3) It is not suited for projects that involve multiple graphical analysis and which do not have model building or data mining.For example Data Plot is given in clustering tab but not in general Explore tab.
4) Despite the fact that it is meant for data miners, no support to biglm packages, as well as parallel programming is enabled in GUI for bigger datasets, though these can be done by R command line in conjunction with the Rattle GUI. Data m7ining is typically done on bigger datsets.
5) May have some problems installing it as it is dependent on GTK and has a lot of packages as dependencies.

Top Row-
This has the Execute Button (shown as two gears) and which has keyboard shortcut F2. It is used to execute the options in Tabs-and is equivalent of submit code button.
Other buttons include new Projects,Save and Load projects which are files with extension to .rattle an which store all related information from Rattle.
It also has a button for exporting information in the current Tab as an open office document, and buttons for interrupting current process as well as exiting Rattle.

Data Tab-
It has the following options.
●        Data Type- These are radio buttons between Spreadsheet (and Comma Separated Values), ARFF files (Weka), ODBC (for Database Connections),Library (for Datasets from Packages),R Dataset or R datafile, Corpus (for Text Mining) and Script for generating the data by code.
●        The second row-in Data Tab in Rattle is Detail on Data Type- and its apperance shifts as per the radio button selection of data type in previous step. For Spreadsheet, it will show Path of File, Delimiters, Header Row while for ODBC it will show DSN, Tables, Rows and for Library it will show you a dropdown of all datasets in all R packages installed locally.
●        The third row is a Partition field for splitting dataset in training,testing,validation and it shows ratio. It also specifies a Random seed which can be customized for random partitions which can be replicated. This is very useful as model building requires model to be built and tested on random sub sets of full dataset.
●        The fourth row is used to specify the variable type of inputted data. The variable types are
○        Input: Used for modeling as independent variables
○        Target: Output for modeling or the dependent variable. Target is a categoric variable for classification, numeric for regression and for survival analysis both Time and Status need to be defined
○        Risk: A variable used in the Risk Chart
○        Ident: An identifier for unique observations in the data set like AccountId or Customer Id
○        Ignore: Variables that are to be ignored.
●        In addition the weight calculator can be used to perform mathematical operations on certain variables and identify certain variables as more important than others.

Explore Tab-
Summary Sub-Tab has Summary for brief summary of variables, Describe for detailed summary and Kurtosis and Skewness for comparing them across numeric variables.
Distributions Sub-Tab allows plotting of histograms, box plots, and cumulative plots for numeric variables and for categorical variables Bar Plot and Dot Plot.
It also has Benford Plot for Benford’s Law on probability of distribution of digits.
Correlation Sub-Tab– This displays corelation between variables as a table and also as a very nice plot.
Principal Components Sub-Tab– This is for use with Principal Components Analysis including the SVD (singular value decomposition) and Eigen methods.
Interactive Sub-Tab- Allows interactive data exploration using GGobi and Lattice software. It is a powerful visual tool.

Test Tab-This has options for hypothesis testing of data for two sample tests.
Transform Tab-This has options for rescaling data, missing values treatment, and deleting invalid or missing values.
Cluster Tab-It gives an option to KMeans, Hierarchical and Bi-Cluster clustering methods with automated graphs,plots (including dendogram, discriminant plot and data plot) and cluster results available. It is highly recommended for clustering projects especially for people who are proficient in clustering but not in R.

Associate Tab-It helps in building association rules between categorical variables, which are in the form of “if then”statements. Example. If day is Thursday, and someone buys Milk, there is 80% chance they will buy Diapers. These probabilities are generated from observed frequencies.

Model Tab-The Model tab makes Rattle one of the most advanced data mining tools, as it incorporates decision trees(including boosted models and forest method), linear and logistic regression, SVM,neural net,survival models.
Evaluate Tab-It as functionality for evaluating models including lift,ROC,confusion matrix,cost curve,risk chart,precision, specificity, sensitivity as well as scoring datasets with built model or models. Example – A ROC curve generated by Rattle for Survived Passengers in Titanic (as function of age,class,sex) This shows comparison of various models built.

Log Tab- R Code is automatically generated by Rattle as the respective operation is executed. Also timestamp is done so it helps in reviewing error as well as evaluating speed for code optimization.
—————————————————————————————————————————-
JGR- Deducer- (see http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual
JGR is a Java Based GUI. Deducer is recommended for use with JGR.
Deducer has basically been made to implement GGPLOT in a GUI- an advanced graphics package based on Grammer of Graphics and was part of Google Summer of Code project.

It first asks you to either open existing dataset or load a new dataset with just two icons. It has two initial views in Data Viewer- a Data view and Variable view which is quite similar to Base SPSS. The other Deducer options are loaded within the JGR console.

Advantages of Using Deducer
1.      It has an option for factor as well as reliability analysis which is missing in other graphical user interfaces like R Commander and Rattle.
2.      The plot builder option gives very good graphics -perhaps the best in other GUIs. This includes a color by option which allows you to shade the colors based on variable value. An addition innovation is the form of templates which enables even a user not familiar with data visualization to choose among various graphs and click and drag them to plot builder area.
3.      You can set the Java Gui for R (JGR) menu to automatically load some packages by default using an easy checkbox list.
4.      Even though Deducer is a very young package, it offers a way for building other R GUIs using Java Widgets.
5.      Overall feel is of SPSS (Base GUI) to it’s drop down menu, and selecting variables in the sub menu dialogue by clicking to transfer to other side.SPSS users should be more comfortable at using this.
6.      A surprising thing is it rearranges the help documentation of all R in a very presentable and organized manner
7.      Very convenient to move between two or more datasets using dropdown.
8.      The most convenient GUI for merging two datasets using common variable.

Dis Advantages of Using Deducer
1.      Not able to save plots as images (only options are .pdf and .eps), you can however copy as image.
2.      Basically a data viualization GUI – it does offer support for regression, descriptive statistics in the menu item Extras- however the menu suggests it is a work in progress.
3.      Website for help is outdated, and help documentation specific to Deducer lacks detail.

Components of Deducer-
Data Menu-Gives options for data manipulation including recoding variables,transform variables (binning, mathematical operation), sort dataset, transpose dataset ,merge two datasets.
Analysis Menu-Gives options for frequency tables, descriptive statistics,cross tabs, one sample tests (with plots) ,two sample tests (with plots),k sample tests, correlation,linear and logistic models,generalized linear models.
Plot Builder Menu- This allows plots of various kinds to be made in an interactive manner.

Correlation using Deducer.

————————————————————————————————————————–
Red-R – A dataflow user interface for R (see http://red-r.org/

Red R uses dataflow concepts as a user interface rather than menus and tabs. Thus it is more similar to Enterprise Miner or Rapid Miner in design. For repeatable analysis dataflow programming is preferred by some analysts. Red-R is written in Python.

Advantages of using Red-R
1) Dataflow style makes it very convenient to use. It is the only dataflow GUI for R.
2) You can save the data as well as analysis in the same file.
3) User Interface makes it easy to read R code generated, and commit code.
4) For repeatable analysis-like reports or creating models it is very useful as you can replace just one widget and other widget/operations remain the same.
5) Very easy to zoom into data points by double clicking on graphs. Also to change colors and other options in graphs.
6) One minor feature- It asks you to set CRAN location just once and stores it even for next session.
7) Automated bug report submission.

Disadvantages of using Red-R
1) Current version is 1.8 and it needs a lot of improvement for building more modeling types as well as debugging errors.
2) Limited features presently.
———————————————————————————————————————-
RKWard (see http://rkward.sourceforge.net/)

It is primarily a KDE GUI for R, so it can be used on Ubuntu Linux. The windows version is available but has some bugs.

Advantages of using RKWard
1) It is the only R GUI for time series at present.
In addition it seems like the only R GUI explicitly for Item Response Theory (which includes credit response models,logistic models) and plots contains Pareto Charts.
2) It offers a lot of detail in analysis especially in plots(13 types of plots), analysis and distribution analysis ( 8 Tests of normality,14 continuous and 6 discrete distributions). This detail makes it more suitable for advanced statisticians rather than business analytics users.
3) Output can be easily copied to Office documents.

Disadvantages of using RKWard
1) It does not have stable Windows GUI. Since a graphical user interface is aimed at making interaction easier for users- this is major disadvantage.
2) It has a lot of dependencies so may have some issues in installing.
3) The design categorization of analysis,plots and distributions seems a bit unbalanced considering other tabs are File, Edit, View, Workspace,Run,Settings, Windows,Help.
Some of the other tabs can be collapsed, while the three main tabs of analysis,plots,distributions can be better categorized (especially into modeling and non-modeling analysis).
4) Not many options for data manipulation (like subset or transpose) by the GUI.
5) Lack of detail in documentation as it is still on version 0.5.3 only.

Components-
Analysis, Plots and Distributions are the main components and they are very very extensive, covering perhaps the biggest range of plots,analysis or distribution analysis that can be done.
Thus RKWard is best combined with some other GUI, when doing advanced statistical analysis.

Image via Wikipedia

GrapherR

GrapheR is a Graphical User Interface created for simple graphs.

Depends: R (>= 2.10.0), tcltk, mgcv
Description: GrapheR is a multiplatform user interface for drawing highly customizable graphs in R. It aims to be a valuable help to quickly draw publishable graphs without any knowledge of R commands. Six kinds of graphs are available: histogram, box-and-whisker plot, bar plot, pie chart, curve and scatter plot.
License: GPL-2
LazyLoad: yes
Packaged: 2011-01-24 17:47:17 UTC; Maxime
Repository: CRAN
Date/Publication: 2011-01-24 18:41:47

More information about GrapheR at CRAN
Path: /cran/new | permanent link

Advantages of using GrapheR

It is bi-lingual (English and French) and can import in text and csv files

The intention is for even non users of R, to make the simple types of Graphs.

The user interface is quite cleanly designed. It is thus aimed as a data visualization GUI, but for a more basic level than Deducer.

Easy to rename axis ,graph titles as well use sliders for changing line thickness and color

Disadvantages of using GrapheR

Lack of documentation or help. Especially tips on mouseover of some options should be done.

Some of the terms like absicca or ordinate axis may not be easily understood by a business user.

Default values of color are quite plain (black font on white background).

Can flood terminal with lots of repetitive warnings (although use of warnings() function limits it to top 50)

Some of axis names can be auto suggested based on which variable s being chosen for that axis.

Package name GrapheR refers to a graphical calculator in Mac OS – this can hinder search engine results

Using GrapheR

Data Input -Data Input can be customized for CSV and Text files.

GrapheR gives information on loaded variables (numeric versus Factors)

It asks you to choose the type of Graph

It then asks for usual Graph Inputs (see below). Note colors can be customized (partial window). Also number of graphs per Window can be easily customized

Graph is ready for publication

Related Articles

How would a graph of negative and positive acceleration differ(wiki.answers.com)

Contest to build an R package recommendation engine(dataists.com)

Dracula Graph Library(graphdracula.net)

Why does a cumulative frequency graph never go down(wiki.answers.com)

Heatmap tables (r-bloggers.com)

Graph.tk – Online Graphing Utility (freetech4teachers.com)

Graph Generator(topcoder.com)

Summary of R GUIs

Using R from other software- Please note that interfaces to R exist from other software as well. These include software from SAS Institute, IBM SPSS, Rapid Miner,Knime and Oracle.

A brief list is shown below-

1) SAS/IML Interface to R- You can read about the SAS Institute’s SAS/ IML Studio interface to R at http://www.sas.com/technologies/analytics/statistics/iml/index.html
2) Rapid Miner Extension to R-You can view integration with Rapid Miner’s extension to R here at http://www.youtube.com/watch?v=utKJzXc1Cow
3) IBM SPSS plugin for R-SPSS software has R integration in the form of a plugin. This was one of the earliest third party software offering interaction with R and you can read more at http://www.spss.com/software/statistics/developer/
4) Knime- Konstanz Information Miner also has R integration. You can view this on
http://www.knime.org/downloads/extensions
5) Oracle Data Miner- Oracle has a data mining offering to it’s very popular database software which is integrated with the R language. The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html
6) JMP- JMP version 9 is the latest to offer interface to R. You can read example scripts here at http://blogs.sas.com/jmp/index.php?/archives/298-JMP-Into-R!.html

R Excel- Using R from Microsoft Excel

Microsoft Excel is the most widely used spreadsheet program for data manipulation, entry and graphics. Yet as dataset sizes have increased, Excel’s statistical capabilities have lagged though it’s design has moved ahead in various product versions.

R Excel basically works at adding a .xla plugin to
Excel just like other Plugins. It does so by connecting to R through Rpackages.

Basically it offers the functionality of R
functions and capabilities to the most widely distributed spreadsheet program. Alldata summaries, reports and analysis end up in a spreadsheet-
R Excel enables R to be very useful for people not
knowing R. In addition it adds (by option) the menus of R Commander as menus inExcel spreadsheet.

Advantages-
Enables R and Excel to communicate thus tieing an advanced statistical tool to the most widely used business analytics tool.

Disadvantages-
No major disadvatage at all to a business user. For a data statistical user, Microsoft Excel is limited to 100,000 rows, so R data needs to be summarized or reduced.

Graphical capabilities of R are very useful, but to a new user, interactive graphics in Excel may be easier than say using Ggplot ot Ggobi.
You can read more on this at http://rcom.univie.ac.at/ or the complete Springer Book http://www.springer.com/statistics/computanional+statistics/book/978-1-4419-0051-7

The combination of cloud computing and internet offers a new kind of interaction possible for scientists as well analysts.

Here is a way to use R on an Amazon EC2 machine, thus renting by hour hardware and computing resources which are scaleable to massive levels , whereas the software is free.

Here is how you can connect to Amazon EC2 and run R.
Running R for Cloud Computing.
1) Logging onto Amazon Console http://aws.amazon.com/ec2/
Note you need your Amazon Id (even the same id which you use for buying books).Note we are into Amazon EC2 as shown by the upper tab. Click upper tab to get into the Amazon EC2
2) Choosing the right AMI-On the left margin, you can click AMI -Images. Now you can search for the image-I chose Ubuntu images (linux images are cheaper) and latest Ubuntu Lucid in the search .You can choose whether you want 32 bit or 64 bit image. 64 bit images will lead to faster processing of data.Click on launch instance in the upper tab ( near the search feature). A pop up comes up, which shows the 5 step process to launch your computing.
3) Choose the right compute instance- – there are various compute instances and they all are at different multiples of prices or compute units. They differ in terms of RAM memory and number of processors.After choosing the compute instance of your choice (extra large is highlighted)- click on continue-
4) Instance Details-Do not choose cloudburst monitoring if you are on a budget as it has a extra charge. For critical production it would be advisable to choose cloudburst monitoring once you have become comfortable with handling cloud computing..
5) Add Tag Details- If you are running a lot of instances you need to create your own tags to help you manage them. It is advisable if you are going to run many instances.
6) Create a key pair- A key pair is an added layer of encryption. Click on create new pair and name it (note the name will be handy in coming steps)
7) After clicking and downloading the key pair- you come into security groups. Security groups is just a set of instructions to help keep your data transfer secure. You want to enable access to your cloud instance to certain IP addresses (if you are going to connect from fixed IP address and to certain ports in your computer. It is necessary in security group to enable SSH using Port 22.
Last step- Review Details and Click Launch
8) On the Left margin click on instances ( you were in Images.>AMI earlier)
It will take some 3-5 minutes to launch an instance. You can see status as pending till then.
9) Pending instance as shown by yellow light-
10) Once the instance is running -it is shown by a green light.
Click on the check box, and on upper tab go to instance actions. Click on connect-
You see a popup with instructions like these-
· Open the SSH client of your choice (e.g., PuTTY, terminal).
· Locate your private key, nameofkeypair.pem
· Use chmod to make sure your key file isn’t publicly viewable, ssh won’t work otherwise:
chmod 400 decisionstats.pem
· Connect to your instance using instance’s public DNS [ec2-75-101-182-203.compute-1.amazonaws.com].
Example
Enter the following command line:
ssh -i decisionstats2.pem root@ec2-75-101-182-203.compute-1.amazonaws.com

Note- If you are using Ubuntu Linux on your desktop/laptop you will need to change the above line to ubuntu@… from root@..

ssh -i yourkeypairname.pem -X ubuntu@ec2-75-101-182-203.compute-1.amazonaws.com

(Note X11 package should be installed for Linux users- Windows Users will use Remote Desktop)

12) Install R Commander on the remote machine (which is running Ubuntu Linux) using the command

Related Articles

Please share:

Related Articles

Please share:

Related Articles (Ps the Related Articles is auto generated by Zementa- a software embedded within WordPress.com in case you are wondering what the deal with the linking is)

Please share:

(Citation: The NIST Definition of Cloud Computing

R Web Interfaces

ArrayExpress R/Bioconductor Workbench

Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

Abstract

Related Articles

Please share:

Related Articles

Please share:

Download Information

Supported Platforms:

Download Links:

Installation Instructions for Revolution R Enterprise Academic Edition

Getting Started with the Revolution R Enterprise

Installation Instructions for RevoDeployR (and RServe)

Getting Started with RevoDeployR

Support

R-25 Benchmarks

Additional Benchmarks

Matrix Multiply

Cholesky Factorization

Singular Value Decomposition with Applications

Related Articles

Please share:

Related Articles

sudo apt-get install r-cran-rcmdr

Please share: