Louis Aslett makes data science on the cloud a 2 click step away

I was having a few issues with trying to configure the latest version of RStudio Server and the free help was not helpful enough. I came to this wonderful site and it made my job on running R on the cloud for students just a 2 click step. The best thing is lots of goodies come pre-installed.


Why an RStudio AMI?

The RStudio team have done a phenomenal job with making it simplicity itself to install, but there are still several motivating factors which led to me creating this AMI:

  • Although simple, it still takes several minutes to install R and RStudio after the virtual machine is going and this adds up if you do it often.
  • More time consuming is getting all the extras one may want such as LaTeX, Git, etc installed.
  • Of course, ‘simple’ is subjective and there are those who don’t know Linux, but want to use RStudio on a server without ever touching a Linux command line.
  • The EBS-backed AMIs with operating systems on tend to have vast swathes of free space which (as a postdoc of modest means) I don’t like paying to store when putting a machine into a stopped state for hibernation between computational runs! Growing an EBS volume is easier than shrinking one, so having a minimally sized AMI ready-to-go saves effort.
  • Having the full tool stack through to linking a Dropbox account in about 5 seconds means that I can go from zero to having a 36-core machine with over 200GB of RAM with all my code and data synced to a fully functional R environment with all supporting tools in a matter of minutes.
  • At the time of writing I couldn’t find any with the standard Amazon search tools and — in the great open-source tradition — that seems like an itch I should scratch!

Screenshot from 2015-09-08 13:12:48


Interesting IBM event at Cercles Delhi

I was there on June 10 to attend the hands-on-session cum event for IBM’s PaaS offering IBM Bluemix. The event took place at http://cercles.co.in/ which is a relatively new startup coworking space in Hauz Khas Village Delhi. Since this was just 5 minute walk from where I currently live, I attended along with an intern and a colleague of my new training company http://decisionstats.org The event was nicely organized, the infrastructure was good, the speakers were quite awesome. To read what really happened you can see the summary at the clouddelhi hashtag.

One thing I noticed R is not really given as much attention in Bluemix. I particularly found IBM Watson APIs (which are RESTful) to be a great case for #rstats packages .

Bluemix has a nice interface, and they are offering 30 day free which is quite low compared to 1 yr of AWS. IBM is focussed on hybrid cloud for enterprises and opportunities for people like us depend on becoming ISV (Independent Software Vendors) or Partners in the IBM ecosystem https://twitter.com/hashtag/CloudDelhi?src=hash

Fortunately I didnt have to speak. I liked Cercles well enough to book a seat for my startup  for the next month something which I have not done so despite considering two-three other co-working hubs in the past in Delhi-Gurgaon.

An additional thing was Woman in Tech as a theme. I found some of the reactions interesting there. Perhaps Governments need to adopt the Woman in Tech theme, but they seem ignorant and uninformed as corporations try to tweak their policies to gain and retain talent, than advise policy makers to help create a better ecosystem. CG_ch8dWoAAsTmR

Cloud Computing for Christmas

My second book – R for Cloud Computing : An Approach for Data Scientists is now ready for sale ( ebook). Softcover should be available within a month. Some of you have already booked an online review copy. It has taken me 2 years to write this book, and as always I accept all feedback on how to be a better writer.

I would like to especially thank Hannah Bracken of Springer Publishing for this.

and I dedicate this book to my 7 year son Kush.


Screenshot from 2014-12-10 10:23:45

Everything that is good in me, come from your love, Kush

How cheap is cloud computing anyway?

So I wanted to really find out how cheap the cloud was- but I got confused by the 23 kinds of instances than Amazon has http://aws.amazon.com/ec2/pricing/ and 15 kinds of instances at https://developers.google.com/compute/pricing.

or whether there is any price collusion between them 😉

Now Amazon has spot pricing so I can bid for prices as well (http://aws.amazon.com/ec2/purchasing-options/spot-instances/ ) and upto 60% off for reserved instances (http://aws.amazon.com/ec2/purchasing-options/reserved-instances/) but charges $2 for dedicated instances (which are not dedicated but pay as you go)

Dedicated Per Region Fee

  • $2 per hour – An additional fee is charged once per hour in which at least one Dedicated Instance of any type is running in a Region.

Google has sustained discounts ( will not offer Windows on the cloud though!)

The table below describes the discount at each usage level. These discounts apply for all instance types.

Usage Level (% of month) % at which incremental is charged Example incremental rate (USD/per hour) for an n1-standard-1 instance
0%-25% 100% of base rate $0.07
25%-50% 80% of base rate $0.056
50%-75% 60% of base rate $0.042
75%-100% 40% of base rate $0.028


Anyways- I tried to create this simple table to help me with it- after all  hard disks are cheap- it is memory I want on the cloud !

Or maybe I am wrong and the cloud is not so cheap- or its just too complicated for someone to build a pricing calculator that can take in prices from all providers (Amazon, Azure, Google Compute) and show us the money!

vCPU RAM(GiB) $ per Hour Type -Linux Usage Provider Notes
t2.micro 1 1 $0.01 General Purpose – Current Generation Amazon (North Virginia) Amazon also has spot instances
t2.small 1 2 $0.03 General Purpose – Current Generation Amazon (North Virginia) that can lower prices
t2.medium 2 4 $0.05 General Purpose – Current Generation Amazon (North Virginia)
m3.medium 1 3.75 $0.07 General Purpose – Current Generation Amazon (North Virginia)
m3.large 2 7.5 $0.14 General Purpose – Current Generation Amazon (North Virginia)
m3.xlarge 4 15 $0.28 General Purpose – Current Generation Amazon (North Virginia)
m3.2xlarge 8 30 $0.56 General Purpose – Current Generation Amazon (North Virginia)
c3.large 2 3.75 $0.11 Compute Optimized – Current Generation Amazon (North Virginia)
c3.xlarge 4 7.5 $0.21 Compute Optimized – Current Generation Amazon (North Virginia)
c3.2xlarge 8 15 $0.42 Compute Optimized – Current Generation Amazon (North Virginia)
c3.4xlarge 16 30 $0.84 Compute Optimized – Current Generation Amazon (North Virginia)
c3.8xlarge 32 60 $1.68 Compute Optimized – Current Generation Amazon (North Virginia)
g2.2xlarge 8 15 $0.65 GPU Instances – Current Generation Amazon (North Virginia)
r3.large 2 15 $0.18 Memory Optimized – Current Generation Amazon (North Virginia)
r3.xlarge 4 30.5 $0.35 Memory Optimized – Current Generation Amazon (North Virginia)
r3.2xlarge 8 61 $0.70 Memory Optimized – Current Generation Amazon (North Virginia)
r3.4xlarge 16 122 $1.40 Memory Optimized – Current Generation Amazon (North Virginia)
r3.8xlarge 32 244 $2.80 Memory Optimized – Current Generation Amazon (North Virginia)
i2.xlarge 4 30.5 $0.85 Storage Optimized – Current Generation Amazon (North Virginia)
i2.2xlarge 8 61 $1.71 Storage Optimized – Current Generation Amazon (North Virginia)
i2.4xlarge 16 122 $3.41 Storage Optimized – Current Generation Amazon (North Virginia)
i2.8xlarge 32 244 $6.82 Storage Optimized – Current Generation Amazon (North Virginia)
hs1.8xlarge 16 117 $4.60 Storage Optimized – Current Generation Amazon (North Virginia)
n1-standard-1 1 3.75 $0.07 Standard Google -US Google charges per minute
n1-standard-2 2 7.5 $0.14 Standard Google -US of usage (subject to minimum of 10 minutes)
n1-standard-4 4 15 $0.28 Standard Google -US
n1-standard-8 8 30 $0.56 Standard Google -US
n1-standard-16 16 60 $1.12 Standard Google -US
n1-highmem-2 2 13 $0.16 High Memory Google -US
n1-highmem-4 4 26 $0.33 High Memory Google -US
n1-highmem-8 8 52 $0.66 High Memory Google -US
n1-highmem-16 16 104 $1.31 High Memory Google -US
n1-highcpu-2 2 1.8 $0.09 High CPU Google -US
n1-highcpu-4 4 3.6 $0.18 High CPU Google -US
n1-highcpu-8 8 7.2 $0.35 High CPU Google -US
n1-highcpu-16 16 14.4 $0.70 High CPU Google -US
f1-micro 1 0.6 $0.01 Shared Core Google -US
g1-small 1 1.7 $0.04 Shared Core Google -US

Using Windows Azure Machine Learning as a service with R #rstats

A Brief Tutorial I wrote by playing with the software at manage.windowsazure.com

2013 Thank You Note

I would like to write a thank you note to  some of the people who helped make Decisionstats.com possible . We had a total of 150,644 views this year.For that, I have to thank you dear readers for putting up with me- it is now our seventh year.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
13,940 12,153 12,948 13,371 12,778  12,085  12,894  11,934  9,914  14,764  12,907  10,956  150,644

I would like to thank Chris  (of Mashape) for helping me with some of the interviews I wrote here .I did 26 interviews this year for Programmable Web and a total of 30+ articles including the interviews in 2013.

Of course- we have now reached 116 excellent interviews on Decisionstats.com alone ( see http://goo.gl/V6UsCG )I would like to thank each one of the interviewees who took precious time to fill out the questions.

Sponsors- I would like to thank Dr Eric Siegel ( individually as an author and as founder chair of www.pawcon.com ) , Nadja and Ingo (for Rapid-Miner) , Dr Jonathan ( for Datamind) , Chris M (for Statace.com ) , Gergely ( Author) and many more during all these six years who have kept us afloat and the servers warm in these days of cold reflection, including Gregory (of KDNuggets.com) and erstwhile AsterData founders.

Training Partners- I would like to thank Lovleen Bhatia ( of Edureka  for giving me the opportunity to make http://www.edureka.in/r-for-analytics which now has 1721 learners as per http://www.edureka.in/)

I would also specially say Thank you to Jigsaw Academy for giving me the opportunity to create
the first affordable and quality R course in Asia http://analyticstraining.com/2013/jigsaw-completes-training-of-300-students-on-r/

These training courses including those by Datamind and Coursera remain a formidable and affordable alternative to many others catching up in the analytics education game in India ( an issue I wrote here)

Each and Everyone of my students (past and present) and Everyone in the #rstats  and SAS-L community, including people who may have been left out.

Thank you sir, for helping me and Decisionstats.com !

Wish each one of you a very happy and Joyous Happy New Year and a great and prosperous 2014!