One area where clearly GUI methods are preferable to command line methods in R, is data input. There is no need of learning read.csv or read.table when these options are only two clicks away in any R GUI. For academics/students there is a definite need to easily access
datasets from attached packages just as it is a need for business analysts to access databases with a few clicks than learn or read pages of pdf on RODBC. However some GUI (like Rattle) need data only in data frames, rather than list or arrays-this limits R’s flexibility. These are my views but you can see and compare views of data input in R Commander, Rattle and Deducer.
Track 2: Social Data and Telecom
Case Study: Major North American Telecom
Social Networking Data for Churn Analysis
A North American Telecom found that it had a window into social contacts – who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.
Speaker: David Katz, Senior Analyst, Dataspora, and President, David Katz Consulting
R Commander Extensions: Enhancing a Statistical Graphical User Interface by extending menus to statistical packages
R Commander ( see paper by Prof J Fox at http://www.jstatsoft.org/v14/i09/paper ) is a well known and established graphical user interface to the R analytical environment.
While the original GUI was created for a basic statistics course, the enabling of extensions (or plug-ins http://www.r-project.org/doc/Rnews/Rnews_2007-3.pdf ) has greatly enhanced the possible use and scope of this software. Here we give a list of all known R Commander Plugins and their uses along with brief comments.
Note the naming convention for above e plugins is always with a Prefix of “RCmdrPlugin.” followed by the names above
Also on loading a Plugin, it must be already installed locally to be visible in R Commander’s list of load-plugin, and R Commander loads the e-plugin after restarting.Hence it is advisable to load all R Commander plugins in the beginning of the analysis session.
However the notable E Plugins are
1) DoE for Design of Experiments-
Full factorial designs, orthogonal main effects designs, regular and non-regular 2-level fractional
factorial designs, central composite and Box-Behnken designs, latin hypercube samples, and simple D-optimal designs can currently be generated from the GUI. Extensions to cover further latin hypercube designs as well as more advanced D-optimal designs (with blocking) are planned for the future.
2) Survival- This package provides an R Commander plug-in for the survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves, and testing for differences in survival curves, along with data-management facilities and a variety of tests, diagnostics and graphs.
3) qcc -GUI for Shewhart quality control charts for continuous, attribute and count data. Cusum and EWMA charts. Operating characteristic curves. Process capability analysis. Pareto chart and cause-and-effect chart. Multivariate control charts
4) epack- an Rcmdr “plug-in” based on the time series functions. Depends also on packages like , tseries, abind,MASS,xts,forecast. It covers Log-Exceptions garch
and following Models -Arima, garch, HoltWinters
5)Export- The package helps users to graphically export Rcmdr output to LaTeX or HTML code,
via xtable() or Hmisc::latex(). The plug-in was originally intended to facilitate exporting Rcmdr
output to formats other than ASCII text and to provide R novices with an easy-to-use,
easy-to-access reference on exporting R objects to formats suited for printed output. The
package documentation contains several pointers on creating reports, either by using
conventional word processors or LaTeX/LyX.
6) MAc- This is an R-Commander plug-in for the MAc package (Meta-Analysis with
Correlations). This package enables the user to conduct a meta-analysis in a menu-driven,
graphical user interface environment (e.g., SPSS), while having the full statistical capabilities of
R and the MAc package. The MAc package itself contains a variety of useful functions for
conducting a research synthesis with correlational data. One of the unique features of the MAc
package is in its integration of user-friendly functions to complete the majority of statistical steps
involved in a meta-analysis with correlations. It uses recommended procedures as described in
The Handbook of Research Synthesis and Meta-Analysis (Cooper, Hedges, & Valentine, 2009).
A query to help for ??Rcmdrplugins reveals the following information which can be quite overwhelming given that almost 20 plugins are now available-
Glossary for DoE terminology as used in
RcmdrPlugin.DoE Linear Model Dialog for
RcmdrPlugin.DoE response surface model Dialog
for experimental data
R-Commander plugin package that implements
design of experiments facilities from packages
DoE.base, FrF2 and DoE.wrapper into the
Functions used in menus
Internal RcmdrPlugin.doex objects
Install the DOEX Rcmdr Plug-In
Internal functions for menu system of
Help with EHES sampling
Graphically export objects to LaTeX or HTML
Internal RcmdrPlugin.FactoMineR objects
Graphical User Interface for FactoMineR
An IPSUR Plugin for the R Commander
Meta-Analysis with Correlations (MAc) Rcmdr
Meta-Analysis with Mean Differences (MAd) Rcmdr
RcmdrPlugin.orloca: A GUI for orloca-package
RcmdrPlugin.orloca: A GUI for orloca-package
RcmdrPlugin.orloca.es: Una interfaz grafica
para el paquete orloca
Install the Demos Rcmdr Plug-In
Internal RcmdrPlugin.qual objects
Install the quality Rcmdr Plug-In
Internal RcmdrPlugin.SensoMineR objects
Graphical User Interface for SensoMineR
RcmdrPlugin.SLC: A GUI for slc-package
RcmdrPlugin.SLC: A GUI for SLC R package
Efficiently search R Help pages
RcmdrPlugin.steepness: A GUI for
steepness-package (internal functions)
RcmdrPlugin.steepness: A GUI for steepness R
Internal RcmdrPlugin.survival Objects
Rcmdr Plug-In Package for the survival Package
Install the Demos Rcmdr Plug-In
Often I am asked by clients, friends and industry colleagues on the suitability or unsuitability of particular software for analytical needs. My answer is mostly-
It depends on-
1) Cost of Type 1 error in purchase decision versus Type 2 error in Purchase Decision. (forgive me if I mix up Type 1 with Type 2 error- I do have some weird childhood learning disabilities which crop up now and then)
Here I define Type 1 error as paying more for a software when there were equivalent functionalities available at lower price, or buying components you do need , like SPSS Trends (when only SPSS Base is required) or SAS ETS, when only SAS/Stat would do.
The first kind is of course due to the presence of free tools with GUI like R, R Commander and Deducer (Rattle does have a 500$ commercial version).
The emergence of software vendors like WPS (for SAS language aficionados) which offer similar functionality as Base SAS, as well as the increasing convergence of business analytics (read predictive analytics), business intelligence (read reporting) has led to somewhat brand clutter in which all softwares promise to do everything at all different prices- though they all have specific strengths and weakness. To add to this, there are comparatively fewer business analytics independent analysts than say independent business intelligence analysts.
2) Type 2 Error- In this case the opportunity cost of delayed projects, business models , or lower accuracy – consequences of buying a lower priced software which had lesser functionality than you required.
To compound the magnitude of error 2, you are probably in some kind of vendor lock-in, your software budget is over because of buying too much or inappropriate software and hardware, and still you could do with some added help in business analytics. The fear of making a business critical error is a substantial reason why open source software have to work harder at proving them competent. This is because writing great software is not enough, we need great marketing to sell it, and great customer support to sustain it.
As Business Decisions are decisions made in the constraints of time, information and money- I will try to create a software purchase matrix based on my knowledge of known softwares (and unknown strengths and weakness), pricing (versus budgets), and ranges of data handling. I will add in basically an optimum approach based on known constraints, and add in flexibility for unknown operational constraints.
I will restrain this matrix to analytics software, though you could certainly extend it to other classes of enterprise software including big data databases, infrastructure and computing.
Noted Assumptions- 1) I am vendor neutral and do not suffer from subjective bias or affection for particular software (based on conferences, books, relationships,consulting etc)
2) All software have bugs so all need customer support.
3) All software have particular advantages , strengths and weakness in terms of functionality.
4) Cost includes total cost of ownership and opportunity cost of business analytics enabled decision.
5) All software marketing people will praise their own software- sometimes over-selling and mis-selling product bundles.
Software compared are SPSS, KXEN, R,SAS, WPS, Revolution R, SQL Server, and various flavors and sub components within this. Optimized approach will include parallel programming, cloud computing, hardware costs, and dependent software costs.
1) It is slower with bigger datasets than SPSS language and SAS language .If you use bigger datasets, then you should either consider more hardware , or try and wait for some of the ODBC connect packages.
2) It needs more time to learn than SAS language .Much more time to learn how to do much more.
3) R programmers are lesser paid than SAS programmers.They prefer it that way.It equates the satisfaction of creating a package in development with a world wide community with the satisfaction of using a package and earning much more money per hour.
4) It forces you to learn the exact details of what you are doing due to its object oriented structure. Thus you either get no answer or get an exact answer. Your customer pays you by the hour not by the correct answers.
5) You can not push a couple of buttons or refer to a list of top ten most commonly used commands to finish the project.
6) It is free. And open for all. It is socialism expressed in code. Some of the packages are built by university professors. It is free.Free is bad. Who pays for the mortgage of the software programmers if all softwares were free ? Who pays for the Friday picnics. Who pays for the Good Night cruises?
7) It is free. Your organization will not commend you for saving them money- they will question why you did not recommend this before. And why did you approve all those packages that expire in 2011.R is fReeeeee. Customers feel good while spending money.The more software budgets you approve the more your salary is. R thReatens all that.
8) It is impossible to install a package you do not need or want. There is no one calling you on the phone to consider one more package or solution. R can make you lonely.
9) R uses mostly Command line. Command line is from the Seventies. Or the Eighties. The GUI’s RCmdr and Rattle are there but still…..
10) R forces you to learn new stuff by the month. You prefer to only earn by the month. Till the day your job got offshored…