Spring Cleaning – What I wrote


A partial list of writings by me over the years


  • Big Data Initiatives in Developing Nations


Can big data, open data, and programs such as the Aadhaar Project enhance lives in underprivileged segments of society? March 2015


2) Downsides Dampen Open-Source Analytics September 2011 http://www.allanalytics.com/author.asp?section_id=1408&doc_id=233454


3) KDNuggets – Articles on Data Science


  1. Using Python and R together: 3 main approaches December 2015


  1. Interview: Ingo Mierswa, RapidMiner CEO on “Predaction” and Key Turning Points  June 2014
  2. Guide to Data Science Cheat Sheets 2014/05/12
  3. Book Review: Data Just Right 2014/04/03
  4. Exclusive Interview: Richard Socher, founder of etcML, Easy Text Classification Startup 2014/03/31
  5. Trifacta – Tackling Data Wrangling with Automation and Machine Learning 2014/03/17
  6. Paxata automates Data Preparation for Big Data Analytics 2014/03/07
  7. etcML Promises to Make Text Classification Easy  2014/03/05
  8. Wolfram Breakthrough Knowledge-based Programming Language – what it means for Data Science? 2014/03/02

Programmable Web- Articles on APIs


  1. Keen IO Helps Developers Solve Custom Analytics Needs 06-09-2014
  2. Scoreoid Aims to Gamify the World Using APIs 01-27-2014
  3. Plot.ly’s Plot to Visualize More Data 01-22-2014
  4. LumenData’s Acquisition of Algorithms.io is a Win-Win 01-08-2014
  5. Yactraq API Sees Huge Growth in 2013 01-06-2014
  6. Scrape.it Describes a Better Way to Extract Data12-20-2013
  7. Exclusive Interview: App Store Analytics API 12-04-2013
  8. APIs Enter 3d Printing Industry 11-29-2013
  9. PW Interview: José Luis Martinez of Textalytics 11-06-2013
  10. PW Interview Simon Chan PredictionIO 11-05-2013
  11. PW Interview: Scott Gimpel Founder and CEO FantasyData.com 10-23-2013
  12. PW Interview Brandon Levy, cofounder and CEO of Stitch Labs 10-08-2013
  13. PW Interview: Jolo Balbin Co-Founder Text Teaser 09-18-2013
  14. PW Interview:Bob Bickel CoFounder Redline13 07-29-2013
  15. PW Interview : Brandon Wirtz CTO Stremor.com 07-04-2013
  16. PW Interview: Andy Bartley, CEO Algorithms.io 06-04-2013
  17. PW Interview: Francisco J Martin, CEO BigML.com 05-30-2013
  18. PW Interview: Tal Rotbart Founder- CTO, SpringSense 05-28-2013
  19. PW Interview: Jeh Daruwala CEO Yactraq API, Behavorial Targeting for videos 05-13-2013
  20. PW Interview: Michael Schonfeld of Dwolla API on Innovation Meeting the Payment Web 05-02-2013
  21. PW Interview: Stephen Balaban of Lamda Labs on the Face Recognition API 04-29-2013
  22. PW Interview: Amber Feng, Stripe API, The Payment Web 04-24-2013
  23. PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API 04-22-2013
  24. Google Mirror API documentation is open for developers 04-18-2013
  25. PW Interview: Ricky Robinett, Ordr.in API, Ordering Food meets API 04-16-2013
  26. PW Interview: Jacob Perkins, Text Processing API, NLP meets API 04-10-2013
  27. Amazon EC2 On Demand Windows Instances -Prices reduced by 20% 04-08-2013
  28. Amazon S3 API Requests prices slashed by half 04-03-2013
  29. PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social 04-02-2013
  30. PW Interview: Karthik Ram, rOpenSci, Wrapping all science APIs 03-20-2013
  31. Viralheat Human Intent API- To buy or not to buy 03-13-2013
  32. Interview Tammer Kamel CEO and Founder Quandl 03-07-2013
  33. YHatHQ API: Calling Hosted Statistical Models 03-04-2013
  34. Quandl API: A Wikipedia for Numerical Data 02-25-2013
  35. Amazon Redshift API is out of limited preview and available! 02-18-2013
  36. Windows Azure Media Services REST API 02-14-2013
  37. Data Science Toolkit Wraps Many Data Services in One API 02-11-2013
  38. Diving into Codeacademy’s API Lessons 01-31-2013
  39. Google APIs finetuning Cloud Storage JSON API 01-29-2013
  40. Interview Hilary Mason Chief Scientist bitly 01-28-2013
  41. Interview: Viralheat CEO Raj Kadam on API Growth 01-22-2013
  42. Google Compute API – Affordable Computing at Google Scale 01-17-2013
  43. Ergast API Puts Car Racing Fans in the Driver’s Seat12-05-2012
  44. Springer APIs- Fostering Innovation via API Contests 11-20-2012
  45. Statistically programming the web – Shiny,HttR and RevoDeploy API 11-19-2012
  46. Google Cloud SQL API- Bigger ,Faster and now Free 11-12-2012
  47. A Look at the Web’s Most Popular API -Google Maps API 10-09-2012
  48. Cloud Storage APIs for the next generation Enterprise 09-26-2012
  49. Last.fm API: Sultan of Musical APIs 09-12-2012
  50. Socrata Data API: Keeping Government Open 08-29-2012
  51. BigML API Gets Bigger 08-22-2012
  52. Bing APIs: the Empire Strikes Back 08-15-2012
  53. Google Cloud SQL: Relational Database on the Cloud 08-13-2012
  54. Google BigQuery API Makes Big Data Analytics Easy 08-07-2012
  55. Your Store in The Cloud -Google Cloud Storage API 08-01-2012
  56. Predict the future with Google Prediction API 07-30-2012
  57. The Romney vs Obama API 07-27-2012






1) Big Data Big Analyticshttp://krishnarajpm.com/bigdata/abstract.pdf Workshop on  Statistical Machine Learning and Game Theory  Approaches for Large Scale Data Analysis  9 July 2012 – 14 July 2012  Sponsored by Mathematical Sciences, Division of Science and Engineering  Research Board at Bangalore India

Department of Science & Technology Government of India. (sponsored airfare-hotel accomodation-honorium)

SLIDES Big data Big Analytics

2) Data Analytics using the Cloud- Challenges and Opportunities for India at 1st International Symposium on Big Data and Cloud Computing Challenges(ISBCC-2014) March 27-28, 2014 VIT University, Chennai, India Sponsored by BRNS (flight)


SLIDES Data analytics using the cloud challenges and opportunities for india from Ajay Ohri

3) Open Source Analytics at OSSCamp 2014 http://osscamp.in/


SLIDES- Open source analytics from Ajay Ohri

4) Society for Industrial and Applied Mathematics- Delhi Technological University Evolute 2015 : Annual Symposium Speaker

5) Talk on Analytics as a profession at Indian Institute of Technology Delhi

Learning R and Teaching R from Ajay Ohri


Pre-Placement training workshop for Economics Students, Delhi School of Economics.

A Workshop on R from Ajay Ohri


R for Business Analytics http://www.springer.com/us/book/9781461443421

R for Cloud Computing : A Data Science Approach http://www.springer.com/us/book/9781493917013

Revolution Analytics ( Microsoft) Corporate Blog




Journal Articles

Journal of Statistical Software



Technometrics, Vol. 55 (3), August, 2013



Major Media

been cited by Wired Magazine and ReadWriteWeb for espousing a marketplace for algorithms.




Interviews (of Ajay Ohri)

  1. Big Step Interview July 2015  Expert Interview with Ajay Ohri on the Importance of Big Data http://blog.bigstep.com/big-data-experts-interviews/expert-interview-with-ajay-ohri-on-the-importance-of-big-data/
  2. AnalyticsVidhya Feb 2015 Interview with Industry expert – Ajay Ohri, Founder, decisionstats.com http://www.analyticsvidhya.com/blog/2015/02/interview-expert-ajay-ohri-founder-decisionstats-com/
  3. AnalyticsIndia Magazine Nov 2012 Interview – Ajay Ohri, Author “R for Business Analytics” http://analyticsindiamag.com/interview-ajay-ohri-author-r-for-business-analytics/
  4. HRTechEurope More R in HR Nov 2012 http://blog.hrtecheurope.com/more-r-in-hr/
  5. Data Mining Research Jan 2011 Interview Data Mining Research interview: Ajay Ohrihttp://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

AnalyticBridge Apr 2008 Interview with Ajay Ohri, Data Mining Consultant from India http://www.analyticbridge.com/group/interviews/forum/topics/2004291:Topic:11703

Data Science Apps for Plug and Play Data Science

I was reading the 12 factor App and was struck by how much data science practitioners could use these principles too, for example when making a Shiny Dashboard App

Also I hope we can have more plug and play data science for mobile data or data generated by mobile apps (which is increasing)

Screenshot from 2016-05-11 23:11:17

An example is this app here https://gallery.shinyapps.io/CampaignPlanner_v3/ which can possible modified to add integration with Google Web Analytics API (etc).

This approach can make R more enterprise ready for production environments where it currently lags behind Python in terms of both appeal as well as trained people.


The Twelve Factors

I. Codebase

One codebase tracked in revision control, many deploys

II. Dependencies

Explicitly declare and isolate dependencies

III. Config

Store config in the environment

IV. Backing services

Treat backing services as attached resources

V. Build, release, run

Strictly separate build and run stages

VI. Processes

Execute the app as one or more stateless processes

VII. Port binding

Export services via port binding

VIII. Concurrency

Scale out via the process model

IX. Disposability

Maximize robustness with fast startup and graceful shutdown

X. Dev/prod parity

Keep development, staging, and production as similar as possible

XI. Logs

Treat logs as event streams

XII. Admin processes

Run admin/management tasks as one-off processes

Which tool to learn for a better data science career

Some questions I get from new data scientists

I like R a lot, so should I work towards being better at just that or should I learn excel and python and sas as well (Like a jack of all master of none)?

I like R so much I wrote two books on it. Then I started writing a book on Python and now I am on writers block.

  1. You need to be good at many things (Python, R, SAS, Excel, SQL)
  2. You need to be really really good at one thing ( I prefer Python, but R or SAS could do. SAS people work in large corporations a lot, R people are more statistically driven, Python people are more Silicon Valley /IT driven. I would go with Python)
  3. You should know how data is stored ( in RDBMS and in NoSQL)
  4. You should know how data is processed (cloud computing, server)
  5. You should know how data is visualized ( GGPLOT, Qlikview, Tableau)
  6. You have a limited time to learn all this and again many choices. So try going for more education and more training!

Is it a thumb rule to know advanced analytics with Excel before actually aiming at R?

There are 8 fingers but two thumbs. Thumb rules are shortcuts. They save time (for instructor to explain). Yes I would learn how to analyze data in a spreadsheet too since a lot of employers use spreadsheets. Spreadsheets ( not juts Excel but OpenOffice and Google Docs) are used more than data science tools in analyzing data. Basic principles remain the same.

If I choose to begin with a job so I can get a feel of the industry and get to know it better they ask for all these tools with?

Of course industry wants people with 2 year work experience. Why should they pay you to learn? So learn skills before you expect jobs.

I found out that internships, kaggle competitions, certifications,case studies could help, but not yet in India is what I’ve come to see.. Is that true sir? Am I judging it wrong?

Wrong judging. Judging itself is wrong. Stop judging 1.2 billion people and millions of sq km.

I know many Meetups in India ( I founded one in New Delhi). Kaggle compeitors are great in Mumbai. Bangalore is great for coneferences. So yes, you are comparing to USA ( not fair, its better here because people just are more disciplined in organizing). if you compare to Pakistan we are better.

Best is to stop asking for help, and just go out and attend. Maybe create a Meetup group yourself ( takes only 10$ a month but again someone has to pay it!). Maybe create a desi meetup. Maybe create a Meetup only for Women. or only for New Comers. Or Ask your online education provider. Stop judging countries. Start volunteering. People like http://www.venturesity.com/ are doing a great job for hackthons in India. Meetups in Bangalore https://goo.gl/1wPta5


ask not what your country can do for you — ask what you can do for your country. more meetups in data science. more hackathons. yup.  that is good enough work to start doing for your country.



career advice for data science newcomers

Someone I dont know asked this question-

I had a few questions sir. Could I ask them to you? Like mainly based on the direction in which I should work or learn. I don’t mean to bother you. But it’s hard to find the right people who can guide new comers like me.

I first explained why people don’t really give advice for free. I am using principles I learnt from Reader’s Digest about something known as a Fermi Problem. Fermi Problems are common in tech interviews.

  1. there are 3000 newcomers to every such right person (who gives free advice to newcomers he does not know).
  2. out of them only 300 will get the courage to ask the right people.
  3. out of them 250 will write a badly written email.

so by the time the right person has gotten 250 spam emails, he is not responding to the 50 out of the 3000 who

  1. write well, and
  2. are passionate about learning more.

that is an example how you can use mathematical thinking to understand why things work.

Then I gave in and gave her some free advice on what direction a data science newcomer should put efforts in

which direction should you work?-

  1. interest/passion/quality – do something you are good at, because then only it will sustain your interest and you will be put up the 10000 hours to be great at it. and
  2. greed  (higher salary) versus fear (different skills)– it should make you money but you make more money if you create your own niche. so should you be like thousands of analysts in credit card analytics (easy route) or should you do analysis on videos ( tougher).
  3. networking– why dont you atleast go to data science meetups, and try to take part in a few kaggle competitions. also, have you stopped reading r-bloggers.com or kdnuggets.com.Do this for three months and you will find enough opportunities or data. take decisions based on data not from anecdotal advice from experts.

I hope I was able to be useful. What do you think?

Adding a 2.4 mb file slows page load on mobile devices, but that is a small cost to learn about this great Italian American – Enrico Fermi. 

Enrico Fermi, Italian-American physicist, received the 1938 Nobel Prize in physics for identifying new elements and discovering nuclear reactions by his method of nuclear irradiation and bombardment. The Fermi technique is named after physicist Enrico Fermi as he was known for his ability to make good approximate calculations with little or no actual data. Fermi problems typically involve making justified guesses about quantities and their variance or lower and upper bounds. Probably you can use it for Big Data Analysis about online chatter when your machine learning is not able to process videos (Youtube) or Images ( Instagram) as efficiently as it analyzes text.

%d bloggers like this: