Analytics Consulting made much more affordable due to R #rstats

Here are some ways I can suggest to potential clients they can start with R immediately even if they dont have trained people

 

1) Get R Excel-

This enables R functions as functions within Excel

http://rcom.univie.ac.at/download.html

Our Wiki has < a href=”http://homepage.univie.ac.at/erich.neuwirth/php/rcomwiki/doku.php?id=wiki:how_to_install>instructions how R needs to be installed to work with our tools.

RExcel

The current version is RExcel 3.2.14. The Home and Student version and the Educational version work only with 32bit versions of Excel (which can be installed on 64bit versions of Windows). The Excel versions supported are 2003, 2007, 2010, and 2013.

RExcel 3.2.14 (and later versions) require R 2.15.0 (or later).

You will have to install

  • a suitable version of R
  • a matching version of rscproxy
  • statconnDCOM or rcom with statconnDCOM

Download RExcel 3.2.14

Download REXCEL HOME AND STUDENT USE LICENSE

Detailed instructions for installing can be found in our Wiki

also the book

http://www.amazon.com/Through-Excel-Spreadsheet-Interface-Statistics/dp/1441900519

R, a free and open source program, is one of the most powerful and the fastest-growing statistics program. Microsoft Excel is the most widely used spreadsheet program, but many statisticians consider its statistical tools too limited.

In this book, the authors build on RExcel, a free add-in for Excel that can be downloaded from the R distribution network. RExcel seamlessly integrates the entire set of R’s statistical and graphical methods into Excel, allowing students to focus on statistical methods and concepts and minimizing the distraction of learning a new programming language.

Data can be transferred between R and Excel “the Excel way” by selecting worksheet ranges and using Excel menus. R’s basic statistical functions and selected advanced methods are available from an Excel menu. Results of the computations and statistical graphics can be returned back into Excel worksheet ranges. RExcel allows the use of Excel scroll bars and check boxes to create and animate R graphics as an interactive analysis tool.

2) Use R Commander

Very easy to use GUI- start working with R without writing a single line of code! See http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ 

The R-Commander GUI consists of a window containing several menus, buttons, and information fields. (The menu tree, etc., are shown below.) In addition, the Commander window contains script and output text windows. The R-Commander menus are easily configurable through a text file or, preferably, through plug-in packages, of which many are now available on CRAN.

The menus lead to simple dialog boxes, the general contents of which are more or less obvious from the names of the menu items. These boxes have a common structure, including a help button leading to the help page for a relevant function, and a reset button to reset the dialog to its original state.

By default, commands generated via the dialogs are posted to the output window, along with printed output, and to the script window. Lines in the script window can be edited and (re)submitted for execution. Error messages, warnings, and “notes” appear in a messages window.

Note you can do time series forecasting using the e pack plugin for R commander

3) Use Rattle

a very comprehensive data mining GUI in R

 http://rattle.togaware.com/

Version 3.0.2 release 169 dated 2014-02-20.
> install.packages(“rattle”, repos=”http://rattle.togaware.com&#8221;, type=”source”)
$ wget http://togaware.com.au/access/rattle_3.0.2.tar.gz

Rattle (the R Analytical Tool To Learn Easily) presents statistical and visual summaries of data, transforms data into forms that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

 Errata              Brochure

  • Documentation

4) Dont use R but use WPS a SAS language clone

upto 50 % cheaper by some estimates you can try this SAS language clone for free here http://www.teamwpc.co.uk/tryorbuy/evaluations

5) okay use Oracle R or Revolution R Community

R by Oracle is free

http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/ore-downloads-1502823.html

so is the R Community by Revolution Analytics (basically everything but the Revo Scaler package)

http://info.revolutionanalytics.com/download-revolution-r-community.html

 

6) Use Revolution R on the cloud , and pay by hour , but first few hours are free

https://aws.amazon.com/marketplace/pp/B00GHXJZVY

 

————————————————————————

Setting up an analytics shop has never been so easy in history before in the past 50 years!

The Amazingly Sexy Rmaps package #rstats

The amazing Rmaps makes maps even more data scientisty (or sexy 😉 ) than before by adding incredible javascript like interactivity to people who can just type (or copy and paste)
a few lines of R code

A great analysis of Mexico

http://rcharts.io/viewer/?9223554#.Uw4hOPmSySp

 

Now if could only persuade the very busy creator of the package to honor his promise of sending the interview answers 🙂 !!

 

Also see-

https://decisionstats.com/2013/08/19/the-wonderful-ggmap-package-for-spatial-analysis-in-r-rstats/

Using R with Twitter – great tutorial in #rstats

A great tutorial from one of my students  Kaify Rais,  he is founder of  http://vabida.com/ an analytics company

It is about using Twitter and R together for political sentiment analysis which is going to be this year’s analytics buzzword in India since 2014 is the year of elections

Using R and RapidMiner together #rstats

I just came across this interesting corporate blog, and I must confess I really like the design as well as the content in it. Simafore is an  analytics company. The post of course was on combining R with Rapid Miner

There are many packages and libraries in R, specifically tailored to handle time series forecasting in the “traditional” manner. RapidMiner integrates really well with R by providing two mechanisms:

  • an interactive console, similar to the native R console and somewhat less sophisticated thanRStudio
  • and a more powerful full integration of R capabilities within the RapidMiner process design perspective.

The first option is fairly easy to put into work, assuming you have successfully added the R extension to RapidMiner. But the second option requires some initial planning. The key is to understand how to pass data from RapidMiner to R and back. Once you understand this simple but important aspect, then R essentially becomes another powerful “operator” within the vast library of existing RapidMiner operators

you can read the complete article here  http://www.simafore.com/blog/bid/204923/combining-power-of-r-and-rapidminer-for-time-series-forecasting

Is the non-Chinese Internet America’s cyber colony?

The world’s largest internet companies are either in China or USA (except for a few in Japan)

http://en.wikipedia.org/wiki/List_of_largest_Internet_companies

China’s strategy has helped it in the following-

  1. protect it’s citizen’s data from foreign data collection
  2. helped safeguard the industrial secrets of it’s own corporations
  3. create a domestic ecosystem that benefits it’s own entrepreneurs and investors
  4. created basic infrastructure for resisting cyber warfare (et Stuxnet virus)

Rest of the non Chinese World has now the following ignominy-

  1. Most of profits from IPOs, Ads flow to US companies
  2. US Govt arm spy in active collaboration with US companies including gathering industrial secrets for trade negotiations
  3. Almost zero non Chinese non American Internet Infrastructure
  4. built a monopoly of California led companies across rest of world
  5. led to brain drain of top technical talent to USA
  6. Great ecosystem of investors and tech in USA- to the detriment of all other countries

Internet is the new opium, and after getting the rest of the world addicted- USA law makers are abandoning core principles like net neutrality and opting for unrestricted warrant less spying , as well as preparing inroads for cyber attacks

China is correctly safeguarding the future interests of it’s citizens- while the rest of the world is now a cyber colony mostly occupied by USA

Quo Vadis?

Countries have no friends, only interests

Book Review – Big Data Analytics with R and Hadoop

I have written about Vignesh ‘s impressive work in R before including helping update the RGoogleAnalytics package for the API changes while at Tatvic* He is quite young and very eager to contribute to open source and knowledge.

This is a fairly timely impressive book given that both R and Hadoop are hot topics, have a lot of noise and hoopla around them, and need a straight forward explanation on how to do things using R and Hadoop. It demystifies both R and Hadoop sufficiently for you to actually not be intimidated at the thought  of learning multiple languages (R / Java/ Map Reduce), multiple paradigms (distributed computing and analysis) and multiple installations ( R/ Hadoop/RHadoop). Sufficient to say if the future belongs to Big Data/ Hadoop. Linux users will have it easier than Windows people.

One main criticism I found is to the lay reader everything is written in bullet points which can affect the readability if you are trying to get the big picture. However for the technical user or reader this is really a brilliant way, as everything is neatly written as do this and then do that etc.

The book thus aims to be more of a tutorial and has many nice examples too. I wish however a few more examples from Industry would have added more juice in this. I therefore hope for a companion site which has all the R code and datasets for testing and trying out the business analytics examples .

One wishes the author had written more about the biglm, ff  packages or even RevoScaleR packages . Chapter 5 with Data Analytics should have been more elaborate.  This can be done with more references – the section on visualizing data is just  2 pages and ignores some packages like GoogleVis or even bigvis package. The section about MongoDB and other data types is very useful but again is much more technical and much less analytical. For eg. when does one typically encounter MongoDB versus other data types- what are the drawbacks etc

This is thus a very practical handbook for the tech minded and it is quite affordable for the ebook ( Indian version is just 3.5 $)

I recommend this book highly for people who are aiming to practically implement Big Data Analytics . It is not for statisticians or business users but for people who actually want to set up the whole thing.

Please take a look at http://www.packtpub.com/big-data-analytics-with-r-and-hadoop/book and try it out for a price of less than a (Starbucks!) latte or  a movie DVD .

 

R in the cloud – Revolution takes to AWS

Finally the people at Revolution Analytics have made their software available on AWS .Interesting development and it remains how it will be followed by other providers in stats software.

http://blog.revolutionanalytics.com/2014/02/revolution-r-enterprise-in-the-amazon-cloud.html

Users now have the opportunity to perform statistical analysis and advanced analytics on data sets they might have stored in Amazon’s cloud-based object store Simple Storage Service (S3) or access data from Amazon’s Relational Data Service (RDS).

The cloud offers many benefits to the user, and the AWS Marketplace is no exception. The ability to spin up pre-installed versions of RRE 7 takes all the guesswork out of deployment and provides for a consistent and reliable experience with the software.  Within minutes a user can gain access to R-based analysis from anywhere he or she has an Internet connection.

The Windows version is accessed via Windows Remote Desktop and leverages RRE DevelopR IDE. The Linux version is browser-based and leverages RStudio Server Pro to provide a multi-user IDE.  Both versions are available on instances from 2 – 32 vCPUs and can handle data sets of up to 1 TB for RRE ScaleR analysis. The solution is single-instance only and does not currently offer support for grids or clusters

 

http://www.revolutionanalytics.com/revolution-r-enterprise-aws-marketplace

Technical Details

  • General, Compute, Memory and Storage instances available, 2-32 vCPUs
  • Instances with attached storage recommended. Long-term storage requires EBS or backup to S3
  • Single-server instances only (no cluster or grid support).
  • Revolution R Enterprise DeployR not included.
  • Tech support forums monitored from Sunday, 5:00 PM PDT to Friday, 5:00 PM PDT. Tech support provided in English to registered users only.

Windows Instances

Platform: Windows Server 2008 R2
Revolution R Enterprise version: 7.0.0 (includes R 3.0.0)
Client Requirements: Windows Remote Desktop to access Revolution R Enterprise DeployR IDE

Linux Instances

Platform: Redhat Enterprise Linux 6.4
Revolution R Enterprise version: 7.0.0 (includes R 3.0.0)
Client Requirements: Compatible browse

https://aws.amazon.com/marketplace/pp/B00GHXJZVY/ref=_ptnr_ISV_aws_web

Try one instance of this product for 14 days. There will be no software charges but AWS infrastructure charges still apply. Free Trials will automatically convert to a paid subscription upon expiration.
Hourly Fees (includes Windows 2008 R2 2008R2 X64)
Total hourly fees will vary by instance type and EC2 region.
EC2 Instance Type Software EC2 Total
Standard Large (m1.large) $2.50/hr $0.364/hr $2.864/hr
Standard XL (m1.xlarge) $5.00/hr $0.728/hr $5.728/hr
High-Memory 2XL (m2.2xlarge) $5.00/hr $1.02/hr $6.02/hr
High-Memory 4XL (m2.4xlarge) $10.00/hr $2.04/hr $12.04/hr
High-CPU XL (c1.xlarge) $5.00/hr $0.90/hr $5.90/hr
High I/O 4XL (hi1.4xlarge) $20.00/hr $3.58/hr $23.58/hr
Cluster Compute 8XL (cc2.8xlarge) $20.00/hr $2.97/hr $22.97/hr
EBS Storage Fees
$0.05 / GB / Month for Standard EBS Storage