Book Review – Big Data Analytics with R and Hadoop

I have written about Vignesh ‘s impressive work in R before including helping update the RGoogleAnalytics package for the API changes while at Tatvic* He is quite young and very eager to contribute to open source and knowledge.

This is a fairly timely impressive book given that both R and Hadoop are hot topics, have a lot of noise and hoopla around them, and need a straight forward explanation on how to do things using R and Hadoop. It demystifies both R and Hadoop sufficiently for you to actually not be intimidated at the thought  of learning multiple languages (R / Java/ Map Reduce), multiple paradigms (distributed computing and analysis) and multiple installations ( R/ Hadoop/RHadoop). Sufficient to say if the future belongs to Big Data/ Hadoop. Linux users will have it easier than Windows people.

One main criticism I found is to the lay reader everything is written in bullet points which can affect the readability if you are trying to get the big picture. However for the technical user or reader this is really a brilliant way, as everything is neatly written as do this and then do that etc.

The book thus aims to be more of a tutorial and has many nice examples too. I wish however a few more examples from Industry would have added more juice in this. I therefore hope for a companion site which has all the R code and datasets for testing and trying out the business analytics examples .

One wishes the author had written more about the biglm, ff  packages or even RevoScaleR packages . Chapter 5 with Data Analytics should have been more elaborate.  This can be done with more references – the section on visualizing data is just  2 pages and ignores some packages like GoogleVis or even bigvis package. The section about MongoDB and other data types is very useful but again is much more technical and much less analytical. For eg. when does one typically encounter MongoDB versus other data types- what are the drawbacks etc

This is thus a very practical handbook for the tech minded and it is quite affordable for the ebook ( Indian version is just 3.5 $)

I recommend this book highly for people who are aiming to practically implement Big Data Analytics . It is not for statisticians or business users but for people who actually want to set up the whole thing.

Please take a look at and try it out for a price of less than a (Starbucks!) latte or  a movie DVD .


Author: Ajay Ohri

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: