New Delhi R User group meets up

Inspired by David Smith ‘s blog post at I set up a meetup group for New Delhi at ( India to my surprise has only 1 R user meetup group before this in Bangalore). The first meeting was awesome, we met in a  cafe, and the plan going forward is to cover cross domain learning and collaboration on tools, startups, mashups and training.

Hopefully we can reach out to analytics enthusiasts in Mumbai and Chennai to help kickstart the R User groups. Indian companies like Mu Sigma have been using R more and more in analytics (offshoring). You can even use the sponsorship from Revolution Analytics to start your meetup group ,  gives you a 50% discount if you pay 6 months in advance, and given Oracle’s and IBM/Google\s big Indian presence I hope they lend a hand to User groups for R in India as well.

Interviews and Reviews: More R #rstats

I got interviewed on moving on from Excel to R in Human Resources (HR) here at

“There is a lot of data out there and it’s stored in different formats. Spreadsheets have their uses but they’re limited in what they can do. The spreadsheet is bad when getting over 5000 or 10000 rows – it slows down. It’s just not designed for that. It was designed for much higher levels of interaction.

In the business world we really don’t need to know every row of data, we need to summarise it, we need to visualise it and put it into a powerpoint to show to colleagues or clients.”

And a more recent interview with my fellow IIML mate, and editor at Analytics India Magazine

AIM: Which R packages do you use the most and which ones are your favorites?

AO: I use R Commander and Rattle a lot, and I use the dependent packages. I use car for regression, and forecast for time series, and many packages for specific graphs. I have not mastered ggplot though but I do use it sometimes. Overall I am waiting for Hadley Wickham to come up with an updated book to his ecosystem of packages as they are very formidable, completely comprehensive and easy to use in my opinion, so much I can get by the occasional copy and paste code.


A surprising review at R- /Intelligent Trading

The good news is that many of the large companies do not view R as a threat, but as a beneficial tool to assist their own software capabilities.

After assisting and helping R users navigate through the dense forest of various GUI interface choices (in order to get R up and running), Mr. Ohri continues to handhold users through step by step approaches (with detailed screen captures) to run R from various simple to more advanced platforms (e.g. CLOUD, EC2) in order to gather, explore, and process data, with detailed illustrations on how to use R’s powerful graphing capabilities on the back-end.

Do you want to write a review too? You can visit the site here


Top Funny Charts

I have recently become a Quora addict, and you can see why it is such a great site. If possible say hello to me there at

My latest favorite question-

What are the most hilarious pie charts?

I am only showing you some of the answers, you can see the rest yourself.



Obama’s chief data scientist to keynote PAW SF

From Predictive Analytics Conference,

Detailed Agenda Coming Soon
Predictive Analytics World Header Image

Predictive Analytics World April 14-19 in San Francisco is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote speakers:

Rayid Ghani
Rayid Ghani
Chief Data Scientist
Obama for American 2012 Campaign
Anthony Goldbloom
Anthony Goldbloom
The $3m Heritage Health Prize: Results and Conclusions
Edward Nazarko
Edward Nazarko
Client Technical Advisor
Putting IBM Watson to Work

Education for the common people

Higher education in the West is no longer the exclusive club and proxy partner to student debt. This amazing article by NYTimes shows how- the following are shaking the football loving US education giants in their boots while bring balance back to the Force. Big Education for the common people because you need humans to code in the Big Data era. – The pioneer, has the advantages and disadvantages of being the market leader in an innovating. Sure 1.7 million users, but how many are attending how many lectures and how many newly skilled machine learning people are there after 2 courses have been completed by Andrew Ng, and does industry feel the same about the employ-ability of these course takers. , and – again  some course are common across these platforms so more public web analytics would be a welcome step.

But I really liked Stanford’s open source solution, so that anyone especially and including corporations and companies can start creating their own online courses. It is called class2go and is available at


Class2Go is Stanford’s internal open-source platform for on-line education. A team of eight built the first version over the summer 2012, and it is still under active development. Class2Go launched this Fall for six on-campus classes and two “massive open online courses” (MOOC’s): Computer Networking and Solar Cells, Fuel Cells, and Batteries.

Class2Go was built to be an open platform for learning and research. Professors have access to the classes’ data to learn how their students learn.

Leveraging Others

Projects that help

  • YouTube for video
  • Khan Academy for their HTML-based exercise framework
  • Piazza for forums
  • MySQL is our database
  • The massive Python Django ecosystem: eg. South, Registration
  • Amazon AWS suite for hosting (EC2, S3, RDS, Route53, IAM)
  • Chef from Opscode for configuration management
  • Github for source code management and issues

Good to see technology and the internet bringing back skills to people globally, so in the future they wont have to use a skill shortage excuse to import cyber technology H1b slaves  and coolies from Asia forced to choose between livelihood, family and undeniable economic arbitrage pressure. Maybe they can try and customize it for Africa, or for women or other  needy areas as well.




Amazon drops prices of Linux AMIs by ~20%

Amazon cloud gets more exciting. We are still waiting for the Oracle and Google public clouds (compute) to open up out of beta! See their (rather cluttered) blog

Today, we are excited to announce a new generation of the original Amazon EC2 instance family. Second generation Standard instances (M3 instances) provide customers with the same balanced set of CPU and memory resources as first generation Standard instances (M1 instances) while providing customers with 50% more computational capability/core.

M3 instances are currently available in two instance types; extra-large (m3.xlarge) and double extra-large (m3.2xlarge). Examples of applications that can benefit from the additional CPU horsepower of these new instances include media encoding, batch processing, web servers, caching fleets, and many others. Currently, M3 instances are available in the US East (N. Virginia) Region starting at a Linux On-Demand price of $0.58/hr for extra-large instances. Customers can also purchase M3 instances as Reserved Instances or as Spot instances. We will introduce M3 instances in additional regions in the coming months.

To learn more about Amazon EC2 instance types and to find out which instance type might be useful for you, please visit the Amazon EC2 Instance type page.

Pricing Change for M1 Standard Instances
Along with the introduction of the M3 Standard instance family, we are announcing a reduction in Linux On-Demand pricing for M1 Standard instances in the US East (N. Virginia) and US West (Oregon) Regions by almost 19%. The new pricing is effective from November 1 and is described in the following table

Instance Type Previous Price New Price
m1.small $0.080 $0.065
m1.medium $0.160 $0.130
m1.large $0.320 $0.260
m1.xlarge $0.640 $0.520

You can find out more about pricing for all Amazon EC2 instances by visiting the Amazon EC2 pricing page.


R Studio and Training

I really like the design, course structure and Hadley Wickham (in no particular order) as part of R Studio’ training suite which may be new, but is much better and open. Again I think Oracle’s training is awesome for online features , but some body needs to step up and create a credible R certification here. More power to R 😉

Check it out-