The R Online WikiBook

I came across the R Programming Wikibook at http://en.wikibooks.org/wiki/R_Programming

It is quite surprisingly good- easy to read for a beginner- handy and concise reference for intermediate users. Some chapters like clustering could do with some more support from the community -see http://en.wikibooks.org/wiki/R_Programming/Clustering

[edit]References

But I really liked the pages on Graphics, Modeling and Maths (including Matrix)

See

http://en.wikibooks.org/wiki/R_Programming/Graphics

and http://en.wikibooks.org/wiki/R_Programming/Linear_Models

I really believe that a consolidated one book online documentation can be achieved for R, only if we follow a moderated-wiki like structure. This can be of a great use- since online help documents for R are currently not concise or present a seemingly professional look (due to multiple formats and styles to the documentation) and they rarely do multiple package comparison. All this has made R books the top selling books on statistics on Amazon but a project like R deserves atleast one comprehensive online and concise book which can be used readily without going through all the scattered multiple documentation- a bit like a R Online Doc.This could help in stage next of the project in getting more users to be comfortable with it.

Any volunteers 🙂 ?

The Top Statistical Softwares (GUI)

The list of top Statistical Softwares (GUI) is continued below. You can see the earlier post here

6. R Commander– While initially aimed at being a basic statistics GUI, the tremendous popularity of R Commander and the extensions in the form of plugins has helped make this one of the most widely used GUI. In short if you dont know ANY R, and still want to do basic descriptive stats and modeling this will come in handy- with an added script window for custom code for advanced users and extensions like that for DoE (design of experiments) and QCC (Quality Control) packages the e-plugins are a great way to extend this. I suspect the only thing holding it back is Dr Fox and the rest of R Core’s reluctance to fully embrace GUI as a software medium. You can read his earlier interview here-https://decisionstats.wordpress.com/2009/09/14/interview-professor-john-fox-creator-r-commander/

Technically it is possible to convert just about any package to a GUI menu in R Commander using the e-plugins.

7. SAS GUIs

Enterprise (Guide)

SAS Enterprise Guide was the higher end (and higher priced solution) to enhanced editor’s lack of menu driven commands. It works but many people I know prefer the text editor just as well.


The Enterprise Miner is a separate software and works more like Red R or SPSS Modeler does. Again EM is one of the major DM softwares out there, but the similarity in names is a bit confusing.

Even the Base SAS Enhanced Editor does have some menus for importing data, or querying etc, but it is rarely confused for being a GUI.

8. Oracle Data Miner and Knime

I like both the ODM and Knime but I find the lack of advertising or promotional support puzzling. Both these softwares can do well to combine technical excellence with some marketing. And since they are both free you can check them out yourself here

Oracle Data Mining

You can download it here-(note- the Oracle Web Site itself is a bit aging 🙂 )

http://www.oracle.com/technology/products/bi/odm/odminer.html

Knime is the open source GUI which can be found here-

http://www.knime.org/introduction/features

9. RAwkard

Another R GUI- it stands out on the comprehensive ways you can customize your code in menus rather than writing all or learning by rote the syntax.

From http://sourceforge.net/apps/mediawiki/rkward/index.php?title=Main_Page

you can see it below. I recommend this GUI over other GUIs especially if you are new to R and do more data visualization which needs custom graphics.

10. Red R and R JGR/ Deducer

Red R and RJGR/Deducer are both up and coming GUIs for R. While REd R is R version for Enterprise Miner, Deducer is coming up with a new GUI for ggplot the powerful graphics package in R.

Some GUIs excluded from this list are – Statistica, MatLab, EViews(?) because I dont really work with them, and thought it best to turn them over to someone who knows them better.

Hope this list of GUIs helps you- note most of the softwares can be learnt within a quick hour and two if you know basic software skills/data manipulation so going through the GUI list is a faster way of adding value to your resume/knowledge base as well.


CommeRcial R- Integration in software

Some updates to R on the commercial side.

Revolution Computing is apparently now renamed Revolution Analytics. Hopefully this and the GUI development will help pay more focused attention on working in R in a mainstream office situation. I am still waiting for David Smith’s cheery hey-guys-we-changed-again blog post though at a new site called inside-r.org/ or his old blog site at blog.revolution-computing.com

They probably need to hire more people now – Curt Monash, noted all-things-data software guru has the inside dope here

Techworld writes more here at http://www.techworld.com.au/article/345288/startup_wants_r_alternative_ibm_sas

The company’s software is priced “aggressively” versus IBM and SAS. A single supported workstation costs $2,000 for an annual subscription. Pricing for server-based licenses varies depending on the implementation.

But Revolution Analytics faces a tough challenge from those larger vendors, as well as the likes of XLSolutions, which offers R training and a competing software package, R-Plus.

SPSS though continues to integrate R solidly and also march ahead with Python (which is likely to be the next gen in statistical programming if it keeps up) http://insideout.spss.com/

With the release of Version 18 of IBM SPSS Statistics and the Developer product, easy-to-install versions of the Python and R materials are posted.  In particular, look for the R Essentials link on the main page or from the Plugins page.  It installs the R Plugin, the correct version of R, and a bunch of example R integrations as bundles.  It’s much easier to get going with this now.

Netezza , a business intelligence vendor promises more integration and even a training in R based analytics here

R Modeling for TwinFin i-Class

Objective
Learn how to use TwinFin i-Class for scaling up the R language.

Description
In this class, you’ll learn how to use R to create models using huge data and how to create R algorithms that exploit our asymmetric massively parallel (AMPP®) architecture. Netezza has seamlessly integrated with R to offload the heavy lifting of the computational processing on TwinFin i-Class. This results in higher performance and increased scalability for R. Sign up for this class to learn how to take advantage of TwinFin i-Class for your R modeling. Topics include:

  1. R CRAN package installation on TwinFin i-Class
  2. Creating models using R on TwinFin i-Class
  3. Creating R algorithms for TwinFin i-Class

Format
Hands-on classroom lecture, lab exercises, tour

Audience
Knowledgeable R users – modelers, analytic developers, data miners

Course Length
0.5 day: 12pm-4pm Wednesday, June 23 OR 8am-12pm Thursday, June 24 OR 1pm-5pm Thursday, June 24, 2010

Delivery
Enzee Universe 2010, Boston, MA

Student Prerequisites

  • Working knowledge of R and parallel computing
  • Have analytic, compute-intensive challenges
  • Understanding of data mining and analytics”

My favourite GUI in stats , JMP (also from SAS Institute) is going to deploy R integration as soon as this September – Read more here- http://www.sas.com/news/preleases/JMP-to-R-integrationSGF10.html

Also SAS-IML studio is not lagging behind

The next release of SAS/IML will extend R integration to the server environment – enabling users to deploy results in batch mode and access R from SAS on additional platforms, such as UNIX and Linux.

I am kind of happy at one of the best GUI’s integrating with one of the most innovative stats softwares. It’s like two of your best friends getting married. (see screenshots of the softwares)

All in all- R as a platform making good overall progress from all sides of the corporate software spectrum which can only be good for R developers as well as users/students.

Top 10 Graphical User Interfaces in Statistical Software

Here is a list of top 10 GUIs in Statistical Software. The overall criterion is based on-

  • User Friendly Nature for a New User to begin click and point and learn.
  • Cleanliness of Automated Code or Log generated.
  • Practical application in consulting and corporate world.
  • Cost and Ease of Ownership (including purchase,install,training,maintainability,renewal)
  • Aesthetics (or just plain pretty)

However this list is not in order of ranking- ( as beauty (of GUI) lies in eyes of the beholder). For a list of top 10 GUI in R language only please see –

https://rforanalytics.wordpress.com/graphical-user-interfaces-for-r/

This is only a GUI based list so it excludes notable command line or text editor submit commands based softwares which are also very powerful and user friendly.

  1. JMP –

While critics of SAS Institute often complain on the premium pricing of the basic model (especially AFTER the entry of another SAS language software WPS from http://www.teamwpc.co.uk/products/wps – they should try out JMP from http://jmp.com – it has a 1 month free evaluation, is much less expensive and the GUI makes it very very easy to do basic statistical analysis and testing. The learning curve is surprisingly fast to pick it up (as it should be for well designed interfaces) and it allows for very good quality output graphics as well.

2.SPSS

The original GUI in this class of softwares- it has now expanded to a big portfolio of products. However SPSS 18 is nice with the increasing focus on Python and an early adoptee of R compatible interfaces, SPSS does offer a much affordable solution as well with a free evaluation. See especially http://www.spss.com/statistics/ and http://www.spss.com/software/modeling/modeler-pro/

the screenshot here is of SPSS Modeler

3. WPS

While it offers an alternative to Base SAS and SAS /Access software , I really like the affordability (1 Month Free Evaluation and overall lower cost especially for multiple CPU servers ), speed (on the desktop but not on the IBM OS version ) and the intuitive design as well as extensibility of the Workbench. It may look like an integrated development environment and not a proper GUI, but with all the menu features it does qualify as a GUI in my opinion. Continue reading “Top 10 Graphical User Interfaces in Statistical Software”

Norman Nie: R GUI and More

Here is an interview from Norman Nie, SPSS Founder and CEO, REvolution Computing (R Platform).

Some notable thoughts

For example, SPSS was really among the first to deliver rich GUIs that make it easier to use by more people. This is why one of the first things you’ll see from REvolution is a GUI for R – to make R more accessible and hereby further accelerate adoption.

This is good news if executed- I have often written (in agony actually because I use it) for the need for GUIs for R. My last post on that was here. Indeed the one reason SPSS was easily adopted by business school students (like me) in India in 2001-3 was the much better GUI over SAS ‘s GUIs.

However some self delusion/ PR / cognitive dissonance seems at play at Dr Nie’s words

If you look at the last 40 years of university curriculum, SPSS – the product I helped build – has been the dominant player, even becoming the common thread uniting a diverse range of disciplines, which have in turn been applied to business. Data is ubiquitous: tools and data warehouses allow you to query a given set of data repeatedly. R does these things better than the alternatives out there; it is indeed the wave of the future.

SPSS has been a strong number 2- but it has never overtaken SAS. Part of that is SAS handles much bigger datasets much more easily than SPSS did ( and that is where R’s RAM only size can be a concern). Given the decreasing prices of RAM memory, the BIG-LM like packages, and the shift for cloud based computing(with rampable memory on demand) this can be less of an issue- but analysts generally like to have a straight way of handling bigger datasets. Indeed SAS with vertical focus and the recent social media analytics continues to innovate both itself as well as through its alliance partnerships in the Enterprise software world- and REvolution Computing would further need to tie up or sew these analytical partners especially data warehousing or BI providers to ensure R’s analytical functions can be used where there is maximum value for their usage to the corporate customer as well as the academic customer.

Part 2 of Nie’s interview should be interesting .

2010-2011 would likely see

Round 2 : Red Corner ( Nie)                             Gray Corner (Goodnight)

if

Norman Nie can truly deliver a REvolution in Computing

or else

he becomes number two again the second time around to Jim Goodnight’s software giant.

Using Red R- R with a Visual Interface

For people complaining about the GUI on R, here is the ah Enterprise Version of R called Red R.

It is available at the website at http://www.red-r.org/

 

You can read more there or just go through the short video created by them at

Basically it is a click and point method of using R with the ability to store schemas and thus very good for repeatable operations as well.


Not bad for epic software, huh?

R for Stats : Updated

Here is the new website for statistical analysis using the free analytical software called R (which is enabled for cloud computing as well : see here http://bit.ly/OhriCloud

or http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

for the R tutorial on running it on Amazon’s EC2 pay per demand RAM.

It is called R 4 stats or simply http://www.r4stats.com/

Hosted on Google’s Updated Google Sites Platform- it offers a preview to Bob’s earlier run away hit R for SAS and SPSS users updation as well as his upcoming work R for Stata Users.

In Bob’s words himself –

I have substantially expanded the table that compares SAS and SPSS
add-on modules to somewhat equivalent R packages. This new version is
at:
http://r4stats.com/add-on-modules
and I would very much appreciate any feedback you might have on it.

The site http://r4stats.com is the replacement to
http://RforSASandSPSSusers.com and includes the support files for both
“R for SAS and SPSS Users” and the new “R for Stata Users”, due out in
March from Springer.

Topic SAS Product SPSS Product R Package
Advanced Models
SAS/STAT IBM SPSS Advanced Statistics
R, MASS, many others
Association Analysis
Enterprise Miner
IBM SPSS Association
arules, arulesNBMiner, arulesSequences
Basics Base SAS
IBM SPSS Statistics Base
R
Bootstrapping
SAS/STAT
IBM SPSS Bootstrapping
BootCL, BootPR, boot, bootRes, BootStepAIC, bootspecdens, bootstrap, FRB, gPdtest, meboot, multtest, pvclust, rqmcmb2, scaleboot, simpleboot
Classification Analysis
Enterprise Miner
IBM SPSS Classification
rattle, see the neural networks and trees entries in this table.
Conjoint Analysis
SAS/STAT: PROC TRANSREG
IBM SPSS Conjoint
homals, psychoR, bayesm
Correspondence Analysis
SAS/STAT: PROC CORRESP
IBM SPSS Categories
ade4, cocorresp, FactoMineR, homals, made4, MASS, psychoR, PTAk, vegan
Custom Tables
Base SAS, PROC REPORT, PROC SQL, PROC TABULATE, Enterprise Reporter
IBM SPSS Custom Tables
reshape
Data Access
SAS/ACCESS
SPSS Data Access Pack
DBI, foreign, Hmisc: sas.get, sasxport.get, RODBC
Data Collection
SAS/FSP
IBM SPSS Data Collection Family
RSQLite, and the other open source programs MySQL or PostgreSQL are popular among R users for this purpose.
Data Mining
Enterprise Miner
IBM SPSS Modeler
(formerly Clementine)
arules, FactoMineR, rattle, various functions
Data Mining, In-database Processing
SAS In-Database Initiative with Teradata
IBM SPSS Modeler
PL/R
Data Preparation
Various procedures
IBM SPSS Data Preparation, various commands
dprep, plyr, reshape, sqldf, various functions
Developer Tools
SAS/AF, SAS/FSP, SAS Integration Technologies, SAS/TOOLKIT IBM SPSS Statistics Developer, IBM SPSS Statistics Programmability Extension
StatET, R links to most popular compilers, scripting languages, and databases.
Direct Marketing
Nothing quite like it
IBM SPSS Direct Marketing
Nothing quite like it
Exact Tests
SAS/STAT various
IBM SPSS Exact Tests
coin, elrm, exactLoglinTest, exactmaxsel, and options in many others
Excel Integration
SAS Enterprise BI Server IBM SPSS Advantage for Excel 2007
RExcel
Forecasting
SAS/ETS
IBM SPSS Forecasting
Over 40 packages that do time series are described at the Task View link above under Time Series.
Forecasting, Automated
Forecast Server IBM SPSS Forecasting
forecast
Genetics JMP Genomics
None http://www.bioconductor.org
Geographic Information Systems
SAS/GIS, SAS/GRAPH
None (Maps is defunct)
maps, mapdata, mapproj, GRASS via spgrass6, RColorBrewer, see Spatial in Task Views at link at top
Graphical user interfaces
Enterprise Guide, IML Studio, SAS/ASSIST, Analyst, Insight
IBM SPSS Statistics Base Deducer, JGR, R Commander, pmg, rattle, many others at http://www.sciviews.org/_rgui/
Graphics, Interactive
SAS/IML Studio, SAS/INSIGHT, JMP
None
GGobi via rggobi, iPlots, latticist, playwith
Graphics, Static
SAS/GRAPH
SPSS Base, Graphics Production Language
ggplot2, gplots, graphics, grid, gridBase, hexbin, lattice, plotrix, scatterplot3d, vcd, vioplot, geneplotter, Rgraphics
Graphics, Template Builder
Doesn’t use Grammar of Graphics model that forms the core of IBM SPSS Viz Designer or R’s ggplot2
IBM SPSS Viz Designer
Doesn’t use templates, but this GUI for ggplot2 http://www.stat.ucla.edu/~jeroen/ggplot2.html works similarly to IBM SPSS Viz Designer.
Guided Analytics
SAS/LAB
None
None
Matrix/linear Algebra
SAS/IML Studio
IBM SPSS Matrix
R, matlab, Matrix, sparseM
Missing Values Imputation
SAS/STAT: PROC MI
IBM SPSS Missing Values
amelia, Hmisc: aregImpute, EMV, rms (replaces Design): fit.mult.impute, mice, mitools, mvnmle, VIM
Neural Networks
Enterprise Miner
IBM SPSS Neural Networks
AMORE, grnnR, neuralnet, nnet, rattle
Operations Research
SAS/OR
None
glpk, linprog, LowRankQP, TSP
Power Analysis
SAS Power and Sample Size Application, SAS/STAT:
PROC POWER,
PROC GLMPOWER
SamplePower
asypow, powerpkg, pwr, MBESS
Quality Control
SAS/QC
IBM SPSS Statistics Base qcc, spc
Regression Models
SAS/STAT
IBM SPSS Regression
R, Hmisc, lasso, VGAM, pda, rms (replaces Design)
Sampling, Complex
SAS/STAT: PROC SURVEY SELECT, SURVEYMEANS, etc.
IBM SPSS Complex Samples
pps, sampfling, sampling, spsurvey, survey
Segmentation Analysis
Enterprise Miner
IBM Modeler Segmentation
cluster, rattle, som, see CRAN Task Views under Cluster for over 70 packages
Server Version
SAS for your particular server IBM SPSS Statistics Server,
IBM SPSS Modeler Server
rapache, R(D)COM Server, Rserve, StatET
Structural Equation Modeling
SAS/STAT: PROC CALIS
Amos OpenMX, sem
Text Analysis/Mining
Text Miner
IBM SPSS Text Analytics,
IBM SPSS Text Analysis for Surveys
Rstem, las, tm
Trees, Decision, Classification or Regression
Enterprise Miner
IBM SPSS Decision Trees, IBM SPSS AnswerTree, IBM SPSS Modeler (formerly Clementine)
ada, adabag, BayesTree, boost, GAMboost, gbev, gbm, maptree, mboost, mvpart, party, pinktoe,
quantregForest, rpart,rpart.permutation, randomForest, rattle, tree

All SAS and SPSS product names are registered trademarks of their respective companies.

Disclaimer- Bob Muenchen and I work for the same University. While we do have interesting conflicts often, his interview was one of the earliest where this blog began.

See- http://sites.google.com/site/r4statistics/interview