Using R with Twitter – great tutorial in #rstats

A great tutorial from one of my students  Kaify Rais,  he is founder of  http://vabida.com/ an analytics company

It is about using Twitter and R together for political sentiment analysis which is going to be this year’s analytics buzzword in India since 2014 is the year of elections

Using R and RapidMiner together #rstats

I just came across this interesting corporate blog, and I must confess I really like the design as well as the content in it. Simafore is an  analytics company. The post of course was on combining R with Rapid Miner

There are many packages and libraries in R, specifically tailored to handle time series forecasting in the “traditional” manner. RapidMiner integrates really well with R by providing two mechanisms:

  • an interactive console, similar to the native R console and somewhat less sophisticated thanRStudio
  • and a more powerful full integration of R capabilities within the RapidMiner process design perspective.

The first option is fairly easy to put into work, assuming you have successfully added the R extension to RapidMiner. But the second option requires some initial planning. The key is to understand how to pass data from RapidMiner to R and back. Once you understand this simple but important aspect, then R essentially becomes another powerful “operator” within the vast library of existing RapidMiner operators

you can read the complete article here  http://www.simafore.com/blog/bid/204923/combining-power-of-r-and-rapidminer-for-time-series-forecasting

Book Review – Big Data Analytics with R and Hadoop

I have written about Vignesh ‘s impressive work in R before including helping update the RGoogleAnalytics package for the API changes while at Tatvic* He is quite young and very eager to contribute to open source and knowledge.

This is a fairly timely impressive book given that both R and Hadoop are hot topics, have a lot of noise and hoopla around them, and need a straight forward explanation on how to do things using R and Hadoop. It demystifies both R and Hadoop sufficiently for you to actually not be intimidated at the thought  of learning multiple languages (R / Java/ Map Reduce), multiple paradigms (distributed computing and analysis) and multiple installations ( R/ Hadoop/RHadoop). Sufficient to say if the future belongs to Big Data/ Hadoop. Linux users will have it easier than Windows people.

One main criticism I found is to the lay reader everything is written in bullet points which can affect the readability if you are trying to get the big picture. However for the technical user or reader this is really a brilliant way, as everything is neatly written as do this and then do that etc.

The book thus aims to be more of a tutorial and has many nice examples too. I wish however a few more examples from Industry would have added more juice in this. I therefore hope for a companion site which has all the R code and datasets for testing and trying out the business analytics examples .

One wishes the author had written more about the biglm, ff  packages or even RevoScaleR packages . Chapter 5 with Data Analytics should have been more elaborate.  This can be done with more references – the section on visualizing data is just  2 pages and ignores some packages like GoogleVis or even bigvis package. The section about MongoDB and other data types is very useful but again is much more technical and much less analytical. For eg. when does one typically encounter MongoDB versus other data types- what are the drawbacks etc

This is thus a very practical handbook for the tech minded and it is quite affordable for the ebook ( Indian version is just 3.5 $)

I recommend this book highly for people who are aiming to practically implement Big Data Analytics . It is not for statisticians or business users but for people who actually want to set up the whole thing.

Please take a look at http://www.packtpub.com/big-data-analytics-with-r-and-hadoop/book and try it out for a price of less than a (Starbucks!) latte or  a movie DVD .

 

R in the cloud – Revolution takes to AWS

Finally the people at Revolution Analytics have made their software available on AWS .Interesting development and it remains how it will be followed by other providers in stats software.

http://blog.revolutionanalytics.com/2014/02/revolution-r-enterprise-in-the-amazon-cloud.html

Users now have the opportunity to perform statistical analysis and advanced analytics on data sets they might have stored in Amazon’s cloud-based object store Simple Storage Service (S3) or access data from Amazon’s Relational Data Service (RDS).

The cloud offers many benefits to the user, and the AWS Marketplace is no exception. The ability to spin up pre-installed versions of RRE 7 takes all the guesswork out of deployment and provides for a consistent and reliable experience with the software.  Within minutes a user can gain access to R-based analysis from anywhere he or she has an Internet connection.

The Windows version is accessed via Windows Remote Desktop and leverages RRE DevelopR IDE. The Linux version is browser-based and leverages RStudio Server Pro to provide a multi-user IDE.  Both versions are available on instances from 2 – 32 vCPUs and can handle data sets of up to 1 TB for RRE ScaleR analysis. The solution is single-instance only and does not currently offer support for grids or clusters

 

http://www.revolutionanalytics.com/revolution-r-enterprise-aws-marketplace

Technical Details

  • General, Compute, Memory and Storage instances available, 2-32 vCPUs
  • Instances with attached storage recommended. Long-term storage requires EBS or backup to S3
  • Single-server instances only (no cluster or grid support).
  • Revolution R Enterprise DeployR not included.
  • Tech support forums monitored from Sunday, 5:00 PM PDT to Friday, 5:00 PM PDT. Tech support provided in English to registered users only.

Windows Instances

Platform: Windows Server 2008 R2
Revolution R Enterprise version: 7.0.0 (includes R 3.0.0)
Client Requirements: Windows Remote Desktop to access Revolution R Enterprise DeployR IDE

Linux Instances

Platform: Redhat Enterprise Linux 6.4
Revolution R Enterprise version: 7.0.0 (includes R 3.0.0)
Client Requirements: Compatible browse

https://aws.amazon.com/marketplace/pp/B00GHXJZVY/ref=_ptnr_ISV_aws_web

Try one instance of this product for 14 days. There will be no software charges but AWS infrastructure charges still apply. Free Trials will automatically convert to a paid subscription upon expiration.
Hourly Fees (includes Windows 2008 R2 2008R2 X64)
Total hourly fees will vary by instance type and EC2 region.
EC2 Instance Type Software EC2 Total
Standard Large (m1.large) $2.50/hr $0.364/hr $2.864/hr
Standard XL (m1.xlarge) $5.00/hr $0.728/hr $5.728/hr
High-Memory 2XL (m2.2xlarge) $5.00/hr $1.02/hr $6.02/hr
High-Memory 4XL (m2.4xlarge) $10.00/hr $2.04/hr $12.04/hr
High-CPU XL (c1.xlarge) $5.00/hr $0.90/hr $5.90/hr
High I/O 4XL (hi1.4xlarge) $20.00/hr $3.58/hr $23.58/hr
Cluster Compute 8XL (cc2.8xlarge) $20.00/hr $2.97/hr $22.97/hr
EBS Storage Fees
$0.05 / GB / Month for Standard EBS Storage

 

Big Data Evil Empire

  1. Much more progress has been made in data storage , data querying and data analysis of huge amounts of personally identifiable information , than in encrypting such information
  2. Big Data has as much dual use usage for governments and corporations as uranium has for building bombs or power plants.
  3. There is as much lucre and potential revenue for encrypted data streams in the cloud era – as there for anti virus software in the PC era
  4. Tracking citizens totally is evil- the total costs of such programs is unjustified given the thwarted terrorism plots by Big Data ‘s Cyber Spying. At best I can understand governments spying on citizen’s of other countries to gain advantages in trade
  5. The American dominance of cyber spying and big data threaten to unravel and undermine it’s credibility as de facto leader of the Internet. It proves China’s vision of a walled off internet makes sense and that is a dangerous precedent which could lead to the break up of the internet along national boundaries of electronic fire walls.

2013 in review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 150,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 6 days for that many people to see it.

Click here to see the complete report.

2013 Thank You Note

I would like to write a thank you note to  some of the people who helped make Decisionstats.com possible . We had a total of 150,644 views this year.For that, I have to thank you dear readers for putting up with me- it is now our seventh year.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
13,940 12,153 12,948 13,371 12,778  12,085  12,894  11,934  9,914  14,764  12,907  10,956  150,644

I would like to thank Chris  (of Mashape) for helping me with some of the interviews I wrote here .I did 26 interviews this year for Programmable Web and a total of 30+ articles including the interviews in 2013.

Of course- we have now reached 116 excellent interviews on Decisionstats.com alone ( see http://goo.gl/V6UsCG )I would like to thank each one of the interviewees who took precious time to fill out the questions.

Sponsors- I would like to thank Dr Eric Siegel ( individually as an author and as founder chair of www.pawcon.com ) , Nadja and Ingo (for Rapid-Miner) , Dr Jonathan ( for Datamind) , Chris M (for Statace.com ) , Gergely ( Author) and many more during all these six years who have kept us afloat and the servers warm in these days of cold reflection, including Gregory (of KDNuggets.com) and erstwhile AsterData founders.

Training Partners- I would like to thank Lovleen Bhatia ( of Edureka  for giving me the opportunity to make http://www.edureka.in/r-for-analytics which now has 1721 learners as per http://www.edureka.in/)

I would also specially say Thank you to Jigsaw Academy for giving me the opportunity to create
the first affordable and quality R course in Asia http://analyticstraining.com/2013/jigsaw-completes-training-of-300-students-on-r/

These training courses including those by Datamind and Coursera remain a formidable and affordable alternative to many others catching up in the analytics education game in India ( an issue I wrote here)

Each and Everyone of my students (past and present) and Everyone in the #rstats  and SAS-L community, including people who may have been left out.

Thank you sir, for helping me and Decisionstats.com !

Wish each one of you a very happy and Joyous Happy New Year and a great and prosperous 2014!