#rstats Review of my book R for Cloud Computing in JSS Journal of Statistical Software

A review of R for Cloud Computing is on at Journal of Statistical Software


This is a lively book on a timely topic – or rather, a pair of topics, as the book is as much about R as it is on cloud computing. It should prove useful for those interested in the confluence of the two subject areas


The book features a number of interviews with prominent figures in data science. Though arguably a bit out of place, I believe that most readers will find them interesting and worth inclusion. This book should be of interest to anyone who is new to data storage and analysis in the cloud, especially with R, and even veteran users will find something new here and there.

and areas where the author needs to work much much harder

The book aims to provide step-by-step instructions for painlessly and quickly getting the novice user into the cloud. It does succeed in this for the most part, but any such effort will not be 100% painless after all. Readers who lack background in the cloud may feel overwhelmed at times at the beginning, given all the possible choices and myriad terms. In fact, some terms seem to be undefined, and there is no index (though there is a good bibliography). The figures are inline rather than referenced via numbers, and in some cases they are rather distant from the associated text. The font size in the figures may be too small for comfortable reading for some people.

Read the full review here http://www.jstatsoft.org/v66/b04/paper

and get a look at the full book here http://www.springer.com/book/9781493917013



Many thanks to the encouragement from Dr Matloff.

I may have been forced to drop out of U Tennessee Knoxville MS Stats on health grounds in 2010 but I get by with hard work and chutzpah.


Trying to improve the supply of Data Scientists without ripping young people

In a previous post, I said that many corporate are trying to benefit from the demand for data science as applied to their sector or company but not many are doing enough to improve the supply of data scientists.


In anecdotal arguments for students In India and USA , many have  argued that many training companies are charging exorbitant amounts and misguided promises to essentially teach tools and techniques but not the essential analytical mindset for splicing and dicing of data as well as enough information to reach balance between the three skills for data scientists- statistics, programming and business perspective.

Added to this, many people building tools for data scientists have not worked in data science consulting them self but are addicted to one platform or product due to commercial or intellectual compulsions.


Here is what I think could be a supply side solution to the problem of demand of data scientists hindering actual data science benefits to humanity regardless of commercial or social sectors.

  1. Build up a pool of curated best practice training
  2. Get them validated and verified across different business sectors by industry experts
  3. Add hardware or cloud training to software training
  4. Offer them on accessible platforms like mobile, tablet and web
  5. Offer them on accessible languages like Spanish Swahili Chinese Arabic as well
  6. Gamify some of the content to make it interesting, basically start creating data science hackers at an earlier age than just post graduate students
  7. Tie up with industry to offer internships that are fair balanced and demand equal commitment
  8. Tie in soft skill training for better professionalism
  9. Offer all this for free but use data generated for improving this not only on a human intervention basis but computer adaptive training and testing
  10. Monetize only after you reach a huge scale not prematurely
  11. Make it interactive using videos, 15 minute weekly personalized help on Skype from support, webinars but capture data continuously to drive engagement metrics

Do you want to just make money on the demand (uncertain) for data science but do you want to make more money on the supply side of data science too?