Revolution #Rstats Webinar

David Smith of Revo presents a nice webinar on the capabilities and abilities of Revolution R- if you are R curious and wonder how the commercial version has matured- you may want to take a look.

click below to view an executive Webinar

——————————————————————————————-

Revolution R Enterprise—presented by author and blogger David Smith:

Revolution R: 100% R and More
On-Demand Webinar

This Webinar covers how R users can upgrade to:

  • Multi-processor speed improvements and parallel processing
  • Productivity and debugging with an integrated development environment (IDE) for the R language
  • “Big Data” analysis, with out-of-memory storage of multi-gigabyte data sets
  • Web Services for R, to integrate R computations and graphics into 3rd-Party applications like Excel and BI Dashboards
  • Expert technical support and consulting services for R

This webinar will be of value to current R users who want to learn more about the additional capabilities of Revolution R Enterprise to enhance the productivity, ease of use, and enterprise readiness of open source R. R users in academia will also find this webinar valuable: we will explain how all members of the academic community can obtain Revolution R Enterprise free of charge.

—————————————————————————————

contact -1-855-GET-REVO or via online form.
info@revolutionanalytics.com | (650) 330-0553 | Twitter @RevolutionR

Top Ten Graphs for Business Analytics -Pie Charts (1/10)

I have not been really posting or writing worthwhile on the website for some time, as I am still busy writing ” R for Business Analytics” which I hope to get out before year end. However while doing research for that, I came across many types of graphs and what struck me is the actual usage of some kinds of graphs is very different in business analytics as compared to statistical computing.

The criterion of top ten graphs is as follows-

1) Usage-The order in which they appear is not strictly in terms of desirability but actual frequency of usage. So a frequently used graph like box plot would be recommended above say a violin plot.

2) Adequacy- Data Visualization paradigms change over time- but the need for accurate conveying of maximum information in a minium space without overwhelming reader or misleading data perceptions.

3) Ease of creation- A simpler graph created by a single function is more preferrable to writing 4-5 lines of code to create an elaborate graph.

4) Aesthetics– Aesthetics is relative and  in addition studies have shown visual perception varies across cultures and geographies. However , beauty is universally appreciated and a pretty graph is sometimes and often preferred over a not so pretty graph. Here being pretty is in both visual appeal without compromising perceptual inference from graphical analysis.

 

so When do we use a bar chart versus a line graph versus a pie chart? When is a mosaic plot more handy and when should histograms be used with density plots? The list tries to capture most of these practicalities.

Let me elaborate on some specific graphs-

1) Pie Chart- While Pie Chart is not really used much in stats computing, and indeed it is considered a misleading example of data visualization especially the skewed or two dimensional charts. However when it comes to evaluating market share at a particular instance, a pie chart is simple to understand. At the most two pie charts are needed for comparing two different snapshots, but three or more pie charts on same data at different points of time is definitely a bad case.

In R you can create piechart, by just using pie(dataset$variable)

As per official documentation, pie charts are not  recommended at all.

http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/pie.html

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

Cleveland (1985), page 264: “Data that can be shown by pie charts always can be shown by a dot chart. This means that judgements of position along a common scale can be made instead of the less accurate angle judgements.” This statement is based on the empirical investigations of Cleveland and McGill as well as investigations by perceptual psychologists.

—-

Despite this, pie charts are frequently used as an important metric they inevitably convey is market share. Market share remains an important analytical metric for business.

The pie3D( ) function in the plotrix package provides 3D exploded pie charts.An exploded pie chart remains a very commonly used (or misused) chart.

From http://lilt.ilstu.edu/jpda/charts/chart%20tips/Chartstip%202.htm#Rules

we see some rules for using Pie charts.

 

  1. Avoid using pie charts.
  2. Use pie charts only for data that add up to some meaningful total.
  3. Never ever use three-dimensional pie charts; they are even worse than two-dimensional pies.
  4. Avoid forcing comparisons across more than one pie chart

 

From the R Graph Gallery (a slightly outdated but still very comprehensive graphical repository)

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=4

par(bg="gray")
pie(rep(1,24), col=rainbow(24), radius=0.9)
title(main="Color Wheel", cex.main=1.4, font.main=3)
title(xlab="(test)", cex.lab=0.8, font.lab=3)
(Note adding a grey background is quite easy in the basic graphics device as well without using an advanced graphical package)

 

Zementis partners with R Analytics Vendor- Revo

Logo for R
Image via Wikipedia

Just got a  PR email from Michael Zeller,CEO , Zementis annoucing Zementis (ADAPA) and Revolution  Analytics just partnered up.

Is this something substantial or just time-sharing http://bi.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal or a Barney Partnership (http://www.dbms2.com/2008/05/08/database-blades-are-not-what-they-used-to-be/)

Summary- Thats cloud computing scoring of models on EC2 (Zementis) partnering with the actual modeling software in R (Revolution Analytics RevoDeployR)

See previous interviews with both Dr Zeller at https://decisionstats.com/2009/02/03/interview-michael-zeller-ceozementis/ ,https://decisionstats.com/2009/05/07/interview-ron-ramos-zementis/ and https://decisionstats.com/2009/10/05/interview-michael-zellerceo-zementis-on-pmml/)

and Revolution guys at https://decisionstats.com/2010/08/03/q-a-with-david-smith-revolution-analytics/

and https://decisionstats.com/2009/05/29/interview-david-smith-revolution-computing/

strategic partnership with Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language. With this partnership, predictive models developed on Revolution R Enterprise are now accessible for real-time scoring through the ADAPA Decisioning Engine by Zementis. 

ADAPA is an extremely fast and scalable predictive platform. Models deployed in ADAPA are automatically available for execution in real-time and batch-mode as Web Services. ADAPA allows Revolution R Enterprise to leverage the Predictive Model Markup Language (PMML) for better decision management. With PMML, models built in R can be used in a wide variety of real-world scenarios without requiring laborious or expensive proprietary processes to convert them into applications capable of running on an execution system.

partnership

“By partnering with Zementis, Revolution Analytics is building an end-to-end solution for moving enterprise-level predictive R models into the execution environment,” said Jeff Erhardt, Revolution Analytics Chief Operation Officer. “With Zementis, we are eliminating the need to take R applications apart and recode, retest and redeploy them in order to obtain desirable results.”

 

Got demo? 

Yes, we do! Revolution Analytics and Zementis have put together a demo which combines the building of models in R with automatic deployment and execution in ADAPA. It uses Revolution Analytics’ RevoDeployR, a new Web Services framework that allows for data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise.

Action Items:

  1. Try our INTERACTIVE DEMO
  2. DOWNLOAD the white paper
  3. Try the ADAPA FREE TRIAL

RevoDeployR & ADAPA allow for real-time analysis and predictions from R to be effectively used by existing Excel spreadsheets, BI dashboards and Web-based applications, all in real-time.

RevoADAPAPredictive analytics with RevoDeployR from Revolution Analytics and ADAPA from Zementis put model building and real-time scoring into a league of their own. Seriously!

The Latest GUI for R- BioR

Once more a spanking new shiny software –

Bio7 is a integrated development environment for ecological modelling based on the Rich-Client-Platformconcept of the Java IDE Eclipse. The Bio7 platform contains several perspectives which arrange several views for a special purpose useful for the development and analysis of ecological models. One special perspective bundles a feature rich GUI (Graphical User Interface) for the statistical software R.
For the bidirectional communication between Java and R the Rserve application is used (as a backend to evaluate R code and transfer data from and to Java).
The Bio7 R perspective (see figure below) is divided into a R-Shell view on the left side (conceptual the R side) and a Table view on the right side (conceptual the Java side).
Data can be imported to a spreadsheet, edited and then transferred to the R workspace. Vice versa data from R can be transferred to a sheet of the Table view and then exported e.g. to an Excel or OpenOffice file.

and

General:

Built upon Eclipse 3.6.1.

Now works with the latest Java version! (Windows version bundled with the latest JRE release).

Removed the Soil perspective (now soils can be modeled with ImageJ (float precision). Active images can be displayed in the 3D discrete view (new example available).

Removed the database perspective and the plant layer. You can now built any discrete models without any plant layer.

Removed several controls in the Control view. Added the “Custom Controls” view. In addition ported the Swing component of the Time panel to Swt.

Deleted the avi to swf converter in the ImageJ menu.

Now patterns can be saved with opened Java editor source. If this file is reopened and dragged on Bio7 the pattern is loaded, the source is compiled and the setup method (if available) is executed. In this way model files can be used for presentations ->drag, setup and run. The save actions are located in the Speadsheet view toolbar.

More options available to disable panel painting and recording of values (if not needed for speed!).

New Setup button in the toolbar of Bio7 to trigger a compiled setup method if available.

Removed the load and save pattern buttons from the toolbar of Bio7. Discrete patterns can now be stored with the available action in the spreadsheet view menu.

New P2 Update Manager available in Bio7.

Updated the Janino Compiler.

New HTML perspective added with a view which embeds the TinyMC editor.

New options to disable painting operations for the discrete panels.

New option to explicitly enable scripts at startup (for a faster startup).

Quadgrid (Hexgrid)

Only states are now available which can be created in the “Spreadsheet” view menu easily. Patterns can be stored and restored as usual but are now stored in an *.exml file.

New method to transfer the quadgrid pattern as a matrix to R.

New method to transfer the population data of all quadgrid states to R.

ImageJ:

Update to the latest version (with additional fixes).

Fixed a bug to rename the image.

Thumbnail browser can now open images recursevely(limited to 1000 pics), the magnifiyng glass can be disabled, too.

Plugins can be installed dynamically with a drag and drop operation on the ImageJ view or toolbar (as known from ImageJ).

Installed plugins now extend the plugin menu as submenus or subsubmenus (not finished yet!).

Plugins can now be created with the Java editor. New Bio7 Wizard available to create a plugin template.

Compiled Java files can be added to a *.jar file with a new available action in the Navigator view (if you rightclick on the files in the Navigator). In this way ImageJ plugins can be packaged in a *.jar.

Floweditor:

Fixed a repaint bug in the debug mode of a flow (now draws correctly the active shape in the flow).

Resize with Strg+Scrollwheel works again.

Comments with more than one line works again.

New Test action to verify connections in a flow.

Debug mode now shows all executed Shapes.

Integrated more default tests (for the verification of a regular flow).

A mouse-click now deletes colored shapes in a flow (e.g. in debug mode).

Points panel:

Integrated (dynamic) Voronoi, Delauney visualization (with area and clip to rectangle action).

Points coordinates can now be set in double precision.

Transfer of point coordinates to R now in double precision.

Bio7 Table:

New import and export of Excel 2007 OOXML.

Row headers can now be resized with the mouse device.

R:

Updated R (2.12.1) and Rserve (0.6.3) to the latest version.

New help action in the R-Shell view.

New action to display help for R specific commands in the embedded Bio7 browser (which opens automatically).

New Key actions to copy the selected variable names to the expression dialog (c=cocatenate (+), a=add (,)).

New action to transfer character or numeric vectors horizontally or vertically in an opened spread (Table view) at selection coordinates.

Empty spaces in the filepath are now allowed under Windows if Rserve is started with a system shell or the RGUI (for the tempfile select a location in the Preferences dialog which is writeable) is started.This works also for the RGUI action.

Improved the search for the “Install packages” action (option “Case Sensitive” added).

API:

New API methods available!

And:

Many fixes since the last version!

 

Installation

Important information:

A certain firewall software can corrupt the Bio7 *.zip file (as well as other files).
Please ensure that you have downloaded a functioning Bio7 1.5 version. In addition it is also reported that a certain antivirus software detects the bundled R software (on Windows) as malware. Often the R specific “open.exe” is detected as malware. Please use a different scanner to make sure that the software is not infected if you have any doubts. For more details see:

http://r.789695.n4.nabble.com/trojan-at-current-development-version-td3244348.html

 

R Commercial Software

Revolution Analytics

http://www.revolutionanalytics.com/ Download- http://www.revolutionanalytics.com/downloads/ Official Screenshot-

• XL Solutions

http://www.experience-rplus.com/ Download-http://www.experience-rplus.com/down.asp Official Screenshot-

Information Builder

http://www.informationbuilders.com/products/webfocus/PredictiveModeling Official Screenshot-

Blue Reference- Inference for R

http://inferenceforr.com/default.aspx Download-http://inferenceforr.com/freetrial/default.aspx Official Screenshot-

R for Excel

http://www.statconn.com/

Download- http://rcom.univie.ac.at/download.html

Also integrates R with Word, Open Office and Excel with Scilab

Choosing R for business – What to consider?

A composite of the GNU logo and the OSI logo, ...
Image via Wikipedia

Additional features in R over other analytical packages-

1) Source Code is given to ensure complete custom solution and embedding for a particular application. Open source code has an advantage that is extensively peer- reviewed in Journals and Scientific Literature.  This means bugs will found, shared and corrected transparently.

2) Wide literature of training material in the form of books is available for the R analytical platform.

3) Extensively the best data visualization tools in analytical software (apart from Tableau Software ‘s latest version). The extensive data visualization available in R is of the form a variety of customizable graphs, as well as animation. The principal reason third-party software initially started creating interfaces to R is because the graphical library of packages in R is more advanced as well as rapidly getting more features by the day.

4) Free in upfront license cost for academics and thus budget friendly for small and large analytical teams.

5) Flexible programming for your data environment. This includes having packages that ensure compatibility with Java, Python and C++.

 

6) Easy migration from other analytical platforms to R Platform. It is relatively easy for a non R platform user to migrate to R platform and there is no danger of vendor lock-in due to the GPL nature of source code and open community.

Statistics are numbers that tell (descriptive), advise ( prescriptive) or forecast (predictive). Analytics is a decision-making help tool. Analytics on which no decision is to be made or is being considered can be classified as purely statistical and non analytical. Thus ease of making a correct decision separates a good analytical platform from a not so good analytical platform. The distinction is likely to be disputed by people of either background- and business analysis requires more emphasis on how practical or actionable the results are and less emphasis on the statistical metrics in a particular data analysis task. I believe one clear reason between business analytics is different from statistical analysis is the cost of perfect information (data costs in real world) and the opportunity cost of delayed and distorted decision-making.

Specific to the following domains R has the following costs and benefits

  • Business Analytics
    • R is free per license and for download
    • It is one of the few analytical platforms that work on Mac OS
    • It’s results are credibly established in both journals like Journal of Statistical Software and in the work at LinkedIn, Google and Facebook’s analytical teams.
    • It has open source code for customization as per GPL
    • It also has a flexible option for commercial vendors like Revolution Analytics (who support 64 bit windows) as well as bigger datasets
    • It has interfaces from almost all other analytical software including SAS,SPSS, JMP, Oracle Data Mining, Rapid Miner. Existing license holders can thus invoke and use R from within these software
    • Huge library of packages for regression, time series, finance and modeling
    • High quality data visualization packages
    • Data Mining
      • R as a computing platform is better suited to the needs of data mining as it has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks as well as exotic specialized algorithms like those based on chaos models.
      • Flexibility in tweaking a standard algorithm by seeing the source code
      • The RATTLE GUI remains the standard GUI for Data Miners using R. It was created and developed in Australia.
      • Business Dashboards and Reporting
      • Business Dashboards and Reporting are an essential piece of Business Intelligence and Decision making systems in organizations. R offers data visualization through GGPLOT, and GUI like Deducer and Red-R can help even non R users create a metrics dashboard
        • For online Dashboards- R has packages like RWeb, RServe and R Apache- which in combination with data visualization packages offer powerful dashboard capabilities.
        • R can be combined with MS Excel using the R Excel package – to enable R capabilities to be imported within Excel. Thus a MS Excel user with no knowledge of R can use the GUI within the R Excel plug-in to use powerful graphical and statistical capabilities.

Additional factors to consider in your R installation-

There are some more choices awaiting you now-
1) Licensing Choices-Academic Version or Free Version or Enterprise Version of R

2) Operating System Choices-Which Operating System to choose from? Unix, Windows or Mac OS.

3) Operating system sub choice- 32- bit or 64 bit.

4) Hardware choices-Cost -benefit trade-offs for additional hardware for R. Choices between local ,cluster and cloud computing.

5) Interface choices-Command Line versus GUI? Which GUI to choose as the default start-up option?

6) Software component choice- Which packages to install? There are almost 3000 packages, some of them are complimentary, some are dependent on each other, and almost all are free.

7) Additional Software choices- Which additional software do you need to achieve maximum accuracy, robustness and speed of computing- and how to use existing legacy software and hardware for best complementary results with R.

1) Licensing Choices-
You can choose between two kinds of R installations – one is free and open source from http://r-project.org The other R installation is commercial and is offered by many vendors including Revolution Analytics. However there are other commercial vendors too.

Commercial Vendors of R Language Products-
1) Revolution Analytics http://www.revolutionanalytics.com/
2) XL Solutions- http://www.experience-rplus.com/
3) Information Builder – Webfocus RStat -Rattle GUI http://www.informationbuilders.com/products/webfocus/PredictiveModeling.html
4) Blue Reference- Inference for R http://inferenceforr.com/default.aspx

  1. Choosing Operating System
      1. Windows

 

Windows remains the most widely used operating system on this planet. If you are experienced in Windows based computing and are active on analytical projects- it would not make sense for you to move to other operating systems. This is also based on the fact that compatibility problems are minimum for Microsoft Windows and the help is extensively documented. However there may be some R packages that would not function well under Windows- if that happens a multiple operating system is your next option.

        1. Enterprise R from Revolution Analytics- Enterprise R from Revolution Analytics has a complete R Development environment for Windows including the use of code snippets to make programming faster. Revolution is also expected to make a GUI available by 2011. Revolution Analytics claims several enhancements for it’s version of R including the use of optimized libraries for faster performance.
      1. MacOS

 

Reasons for choosing MacOS remains its considerable appeal in aesthetically designed software- but MacOS is not a standard Operating system for enterprise systems as well as statistical computing. However open source R claims to be quite optimized and it can be used for existing Mac users. However there seem to be no commercially available versions of R available as of now for this operating system.

      1. Linux

 

        1. Ubuntu
        2. Red Hat Enterprise Linux
        3. Other versions of Linux

 

Linux is considered a preferred operating system by R users due to it having the same open source credentials-much better fit for all R packages and it’s customizability for big data analytics.

Ubuntu Linux is recommended for people making the transition to Linux for the first time. Ubuntu Linux had an marketing agreement with revolution Analytics for an earlier version of Ubuntu- and many R packages can  installed in a straightforward way as Ubuntu/Debian packages are available. Red Hat Enterprise Linux is officially supported by Revolution Analytics for it’s enterprise module. Other versions of Linux popular are Open SUSE.

      1. Multiple operating systems-
        1. Virtualization vs Dual Boot-

 

You can also choose between having a VMware VM Player for a virtual partition on your computers that is dedicated to R based computing or having operating system choice at the startup or booting of your computer. A software program called wubi helps with the dual installation of Linux and Windows.

  1. 64 bit vs 32 bit – Given a choice between 32 bit versus 64 bit versions of the same operating system like Linux Ubuntu, the 64 bit version would speed up processing by an approximate factor of 2. However you need to check whether your current hardware can support 64 bit operating systems and if so- you may want to ask your Information Technology manager to upgrade atleast some operating systems in your analytics work environment to 64 bit operating systems.

 

  1. Hardware choices- At the time of writing this book, the dominant computing paradigm is workstation computing followed by server-client computing. However with the introduction of cloud computing, netbooks, tablet PCs, hardware choices are much more flexible in 2011 than just a couple of years back.

Hardware costs are a significant cost to an analytics environment and are also  remarkably depreciated over a short period of time. You may thus examine your legacy hardware, and your future analytical computing needs- and accordingly decide between the various hardware options available for R.
Unlike other analytical software which can charge by number of processors, or server pricing being higher than workstation pricing and grid computing pricing extremely high if available- R is well suited for all kinds of hardware environment with flexible costs. Given the fact that R is memory intensive (it limits the size of data analyzed to the RAM size of the machine unless special formats and /or chunking is used)- it depends on size of datasets used and number of concurrent users analyzing the dataset. Thus the defining issue is not R but size of the data being analyzed.

    1. Local Computing- This is meant to denote when the software is installed locally. For big data the data to be analyzed would be stored in the form of databases.
      1. Server version- Revolution Analytics has differential pricing for server -client versions but for the open source version it is free and the same for Server or Workstation versions.
      2. Workstation
    2. Cloud Computing- Cloud computing is defined as the delivery of data, processing, systems via remote computers. It is similar to server-client computing but the remote server (also called cloud) has flexible computing in terms of number of processors, memory, and data storage. Cloud computing in the form of public cloud enables people to do analytical tasks on massive datasets without investing in permanent hardware or software as most public clouds are priced on pay per usage. The biggest cloud computing provider is Amazon and many other vendors provide services on top of it. Google is also coming for data storage in the form of clouds (Google Storage), as well as using machine learning in the form of API (Google Prediction API)
      1. Amazon
      2. Google
      3. Cluster-Grid Computing/Parallel processing- In order to build a cluster, you would need the RMpi and the SNOW packages, among other packages that help with parallel processing.
    3. How much resources
      1. RAM-Hard Disk-Processors- for workstation computing
      2. Instances or API calls for cloud computing
  1. Interface Choices
    1. Command Line
    2. GUI
    3. Web Interfaces
  2. Software Component Choices
    1. R dependencies
    2. Packages to install
    3. Recommended Packages
  3. Additional software choices
    1. Additional legacy software
    2. Optimizing your R based computing
    3. Code Editors
      1. Code Analyzers
      2. Libraries to speed up R

citation-  R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

(Note- this is a draft in progress)

%d bloggers like this: