Data Analytics post Demonetization in India

The demonetisation of ₹500 and ₹1000 banknotes was a policy enacted by the Government of India on 8 November 2016.

The announcement was made by the Prime Minister  Narendra Modi .PM  Modi declared that use of all ₹500 and ₹1000 banknotes would be invalid from midnight  and announced the issuance of new ₹500 and ₹2000 banknotes in exchange for the old banknotes.

The government claimed that the demonetisation was an effort to stop counterfeiting of the current banknotes allegedly used for funding terrorism, as well as a crack down on black money in the country. The move was described as an effort to reduce corruption, the use of drugs, and smuggling.

(source – )


This led to huge lines of people  outside banks and ATMs to withdraw new notes for daily needs and depositing cash to exchange notes f older denominations.

This also leads to a huge data analytics opportunity for data science to serve treasury and tax departments of India. The following data points would be of particular scrutiny for Indian data scientists helping or om contract to Indian Govt.

  1. Fraud– This would examine data points where inactive and dormant bank accounts suddenly had a huge inflow of cash. This data would be further matched and merged with income tax records using PAN CARD as a matching and AADHAR CARD too. Additional matching keys would be Name, Date of Birth, Address
  2. Terrorism – Terrorism in India is specific to a few geographic areas like Jammu and Kashmir and Naxalite areas. These could be further analyzed for fine tume of unusual currency patterns
  3. Cashless modes for laundering money ( Anti Money Laundering)- Plastic Money and Mobile apps saw a huge upsurge for transactions. This could be further used for additional sources of information since KYC norms of Telecom need Identification and so do Bank Accounts.
  4. Specific sectors- Land (real estate), Jewelery and other high value, high ticket items can be scrutinized


Overall data will be huge, so choosing the right database combination as well as the analytic (including especialy Big Data Spatial Analytics) could be key to help the current PM ‘s ambitious vision to transform India’s economy.

Comments are welcome.

Internships for Data Scientists at DecisionStats in New Delhi India


It’s your chance to work in the field of Data Science Interns with Decisionstats for IT Internship.

About the Internship: They are unpaid.
The data scientists will create , edit and make data science research and assist in writing.

The intern will be given on the job training for data science and analytics in Python SAS and R .

The  data science intern will create , edit and make schedules and assist in coordination.

The data science intern will be given on the job training for managing in a start up environment, web analytics and search engine marketing as well as an understanding of digital business.

The data science intern will also proof read, edit and write content including blog posts and social media.

The data science intern will be given on the job training for social media, web analytics and search engine optimization as well as an understanding of digital business.
Number of Internships available: 5
Certificate, Letter of recommendation, Flexible work hours, Informal dress code, 5 days a week.
Who can apply:
Only those candidates can apply who are available for full time (in-office) internship. They can start the internship between 30th Sep’16 and 30th Oct’16.
1.  are available for minimum 2 months duration.
2.  are living or staying in Delhi.
3.  are pursuing any degree but have relevant skills and interest.
4.  are currently in any year of study or are recent graduates.

International Students can also apply



Interview Kiran Rama India’s Number One Data Scientist

Here is an interview with Kiran Rama. He is currently Director, Data Sciences & Advanced Analytics at VMWare. I have chosen Kiran as India’s number one data scientist for the following reasons

  1. He has both an impeccable academic record as well as steady work experience across multiple companies
  2. He has demonstrated his expertise in competitions like Kaggle and KDD cup (which is tougher)
  3. He spends more time doing and expanding data science in India


Here is the interview with Kiran Rama, India’s Number One Data Scientist as per 2016 as per

Ajay- Describe your career as a data scientist from corporate job, entrepreneurial work experience, winning competitions, patents and finally back to industry
Kiran- I always had a flair for programming being a computer science engineer and winning inter-college on-the-spot programming and debugging fests. Post computer science engineering graduation, my interesting work was at Motorola using C/Linux for developing features for protocol of Multimedia Messaging Service on several mobile phones. I also owned protocol analytics that involved debugging log files looking for null pointer errors!
Post a couple of years of work, I pursued my post-graduation in management from Indian Institute of Management Kozhikode (IIMK) where I finished in the top 5 of the batch majoring in Information Technology & Systems. I was confused what to do after engineering getting 99 percentile+ in both CAT and GATE. I did not want to stop programming or lose my technical orientation and therefore a role in analytics/data sciences seemed like a natural fit as it involved both technical and business stuff.
I was one of the first hires of the e-business analytics team in Dell in 2006.  I got certified in Base SAS and SAS Enterprise Miner and used SAS primarily for data sciences while I used Omniture tools, Excel, SQL for analytics. At Dell Global Analytics, I took on diverse responsibilities and grew from the equivalent of a senior analyst to a Senior Analytics Manager. I touched all parts of e-commerce and e-business.
Some of my achievements in Dell included:
  • “2012 India Innovator of the Year” Award from Michael Dell
  • 3 patents filed at US PTO on various aspects of e-commerce and marketing analytics
  • World Quality Day Finalist in 2010
  • Won the Best Project Award in Global Consumer & Small Business Analytics for 4 consecutive quarters
While at Dell, in 2010 or so, knowing SAS well I was frustrated that I could not freelance using SAS owing to the high cost. At that time I picked up R and it is the best decision I made in my career as I took to R like a fish to water owing to it’s many similarities with C.
At Dell, I started participating in data mining competitions on and had several top finishes.  I was a “Master Data Miner” on kaggle. I had great results in the Amazon Employee Access Challenge, Merck Competition to predict Molecular activity, GEFcom competition on load forecasting, on wind forecasting….etc. My Kaggle pursuits was one of the reason I was recruited by Amazon.
I worked as a Senior Business Analytics Lead at Amazon in their Bangalore office as a Level-6 individual contributor. Level-6 in Amazon in those days was one of the senior-most individual contributor position that they had in Bangalore in the engineering teams and very difficult to get laterally on the technical side. However the role was not to my liking and I decided to leave to head marketing & Customer analytics at Flipkart.
I freelanced for several US startups as part of part-time proprietorship “Chaotic Experiments”. Some projects included:
  • Software Errors: Predict which line in software code is likely to be an error for a US based startup
  • Accident evaluation analysis for a US semi sized startup
  • Predict which music label to recommend to a startup
  • Trying to predict futures prices in the stock market for a US Startup
  • HLA Imputation of Genomic data
At Flipkart, I had the good opportunity of leading several data sciences & analytics projects for Flipkart of which the below ones I am proud of:
  • Leveraging Data Sciences to come up with customer segments for Flipkart’s digital properties
  • Coming up with an email rules engine to determine the best customers to target per category
  • Setting up mobile app analytics at Flipkart
I worked closely at Flipkart with the CTO on data scientist hiring and helped in hiring data scientists being the decision maker for the “data sciences depth” round for data scientists at Flipkart
I continued my hand at Kaggle while at Flipkart and at one time for over a year, I was ranked amongst the top 10 data miners in the world on Kaggle. My top rank was 7 out of some 300K data miners competing in data sciences competitions for sport and the icing on the cake came when I finished in the top 3 in KDD Cup 2014 winning the competition to predict which essay was likely to  get a donation on
Post Flipkart, I was hired into VMW, where I play the role of “Director, Data Sciences & Advanced Analytics”. I play dual roles of functional (where I lead the data sciences innovations team for VMW globally working closely with digital analytics, Digital store/e-commerce, Professional Services, Sales, Marketing, Partner, Pricing, Support,… verticals) and dotted-line (where I represent the equivalent of the Enterprise Information Management in India comprising of Master Data Management, Business Intelligence & Advanced Analytics).
At VMW, have had a unique experience driving B2B data sciences with industry leading projects like:
  • First ever digital buyer journey data sciences project at VMW
  • “Propensity to Buy” models for several products of VMW, for the Technical Account Manager organization,..
  • “Propensity to Sell” models for the partner organization of VMW
  • “Propensity to Respond” models
  • Deployment models
I currently am at VMW and have been here since almost 2.5 years and loving every moment of it.
Ajay- What are the key things you want to say to someone with no work experience and who wants to work as a data scientist ? What would you say to someone who has a few years work experience and wants to switch to data science
Kiran- For both (freshers and experienced), I would say the following are key in the order of priority:
  • Debugging Skills: You cannot give up as a data scientist and should be a person who can sit at one place and continuously debug for hours. Data Science techniques usage will involve installations, OS issues, nitty-gritty aspects of the code,… etc
  • Programming Skills: You cannot be a data scientist if you cannot program. You need to be good at programming. Comments like code is available on the net and I will copy-paste do not work. I judge a data scientist by different parameters and one of the most important ones is the quality of the code!
  • Knowledge of a Programming Language that has a machine learning library (R or Python are an example. R has access to many of the libraries on the CRAN repository while Python has the world beating scikit-learn package)
  • Strong understanding of the mathematical and computer science and statistical background of the data sciences techniques behind the techniques
  •  Ability to translate a business problem into a data sciences problem. This involves key decisions like which is the target, is this a prediction or classification problem, what is the right cross-validation technique, what algorithms to use for data mining, what should be the right evaluation criteria, how the model will likely be deployed,…
  • Strong business/domain understanding can lead to great feature engineering and great success while deployment.
  • Ability to present the results to stakeholders and get buy-in for implementation is very important as well
There is a lot of misconception that everyone should do data sciences. Not everyone is suited for this. If you cannot sit in a place for a stretch and code for 5-6 hours in R or Python and SQL, this is not the right job for you. If you do, this is the best thing to do.
Ajay- you have used many tools like SAS Python and R. How would advise a new data scientist on which tool to learn and how to structure their tools training
Kiran-I would suggest a new data scientist to use Python > R > SAS mainly because:
  • Python and R have better and wider machine learning libraries than SAS
  • Most of the academic work and latest advancements are in Python & R
  • Python is better than R because there are more things you can do in Python including software development. Trust me – there is no money in machine learning libraries. There is money only in applications and closer you are to software development + machine learning, the better
  • Most of the high paying startups and young firms use Python/R and not SAS
  • It is easier to learn Python/R and then if you happen to work for an old behemoth that is a SAS shop, pick up SAS as well
  • Python/R are actual programming languages and better than SAS. SAS uses macros and not functions. SAS uses proprietary dataset format that is largely inefficient. SAS requires you to know different syntax for different methods and also different types of plots. On the other hand, the interface to call any function in R or Python is the same. Example: predict function in R. Since everything is returned as an object in R & Python it is easier to examine them (contrast looking at the object sub-objects to running multiple commands in SAS to find the output datasets – the infamous “ods trace on” in SAS,………etc)
A good data scientist should know both R & Python but better to start out in one and master for a year.
Great books to learn R and Python are:
  • “The Art of R Programming” by Norman Matloff
  • “Python for Data Analysis” by Wes Mc Kinney
For Machine learning fundamentals would recommend:
  • Learning from Data by Mostafa
  • Applied Data Mining by Paolo Giudici
  • Machine Learning by Tom Mitchell


Ajay- What are some key best practises you want to tell to people preparing for data science competitions
Kiran- Here is the link to my kaggle interview on winning the KDD 2014 cup:
Here is the link to my code repository for the winning solution in KDD 2014 cup:
Best practices I suggest are:
  • Build your own repository of functions and methods that you can re-use
  • Understand what the winners of prior competitions did. For example: my code above
  • Keep yourself current with the latest techniques. For example: xgboost
  • Choose the right cross-validation technique. Else, you will overfit
  • Be paranoid about leakage and look for ways to fix leakage in everything including data preparation, feature engineering and modeling
  • Feature Engineering is the key. Even with lesser data, better features will do better than big data
  • Try different methods that are varied. Example: one learner can be tree-based, one bagging, one boosting based, one neural network….etc
  • Always ensemble. It can give 2-5% lift
Last mile optimization is difficult. While you can get a 0.85  AUC easily, taking it to 0.88 AUC can be an uphill task
Ajay- What are some of the key algorithms that a data scientist should know?
Kiran- Some of the algorithms that one should know are:
  • Regularized Logistic Regression (glmnet in R)
  • Bagging technique: Random Forest
  • Boosting Technique: Gradient Boosting Machine, Extreme Gradient Boosting
  • Collaborative Filtering Techniques: LIBFM
  • Non linear learners like Neural Networks
  • Bayesian Methods like BayesTree, bartMachine
  • Support Vector Machines – LIBSVM library
  • Fast learners like Vowpal Wabbit
One should always ensemble multiple techniques in order to get better results
Ajay- Describe your favourite online learning resources for learning data science languages, algorithms etc
Kiran- – nothing beats it – all the vignettes there
Ajay- How do you keep yourself updated on data science knowledge 
Kiran- I have not participated in data science competitions for last 2 years – participating was a way in which I kept myself and pushed myself to be updated
I am very keen on making some original contributions to data sciences research and teaching. I am pursuing a part time doctoral program (PhD) at IIM Lucknow while I do my full-time job. Spend a lot of time on these days to understand the existing contributions to data science theory and how I can make original contributions to the same
I also drive industry-academia interaction with the Data Center and Analytics Lab at IIM Bangalore where I represent VMW on the DCAL board. I am at the forefront of organizing industry events on data sciences to share knowledge and learn about the latest in the industry.
I am thankful to my leaders and my direct team at VMW for giving me so many interesting business problems to solve using data sciences and that pushes me and drives me to keep myself updated


Kiran is a Data Sciences Leader with more than 12 years of experience across marketing, digital (web/mobile), retail, pricing, partner, sales. Experience across B2C, e-commerce & B2B data sciences. One of the Top 10 Player on Kaggle – data mining competition platform – in 2013 and half of 2014 world-wide, Kiran is also KDD 2014 Prize Winner and Holder of 3 US patents. 2012 Innovator of the Year award from Michael Dell.

You can read about him here



Latest DecisionStats Intern

Congratulations to our latest intern for completing the intensive internship at DecisionStats . See work done by here here-

Her latest blog post tries to use Python to understand police shootings in USA


Previous Interns wrote great Python code and R code

see (Sarah Masud and Farheen)


Anshul Gupta

Cricket Analysis –



Chandan Routray


Some points for future interns at DecisionStats-

  1. We normally dont pay interns anything
  2. 80 % interns drop out or are let go because they cannot keep up with the assignments
  3. Remaining 20% usually learn a lot in the intensive program
  4. Internships are like a free boot camp
  5. No more internships till June 2017 because I am trying to write a book
  6. Some research assistantships might be available in December 2016 to help with some code or Lyx formatting for the former
  7. See my LinkedIn profile for reviews given by the 20% interns who manage to stick around
  8. I usually emphasize writing, polyglot tools (both R, SAS and Python) , logical thinking and concise communication for my interns
  9. I usually treat them as students since I dont work for or in a university. That might change as I try and transition out from business to academic research options for a non Phd


Thanking Yoga

So I made an update on LinkedIn where I am lucky to find 12000 connections to talk about Yoga and get 1100 likes.. Connect with me on LinkedIn here 🙂

The post

Screenshot from 2016-06-08 11:54:43

The likes

Screenshot from 2016-06-10 07:24:02

The profile views

Screenshot from 2016-06-10 07:19:25

LOL. Internet works in a strange way. I just wrote it as a thank you note to myself on my 39 birthday.

Related- Inspired by my favourite American hacker




Living with BiPolar Disorder

A close friend of mine recently discovered that she had bipolar disorder. It is a difficult to diagnose disability, and living in India added to both the complexity of diagnosis and treatment. Given the states of high, low, psychotic episodes that bipolar have, in a pseudo conservative society like India brought me to the still humbling fact that more data scientists chase how to make ad clicks better than how to study brain imaging data and more money is spent making Hollywood movies than chasing climate change, Mars, or brain imaging. As the Joker said, everybody loses their mind if one little surprise is given, even to budgeting and funding across the world for healthcare.

Anyways, my friend is back on her feet and doing well with yoga. Yoga can help aid mental disabilities at lower costs, but lol, wait till you have FDA approval for asanas

Which tool to learn for a better data science career

Some questions I get from new data scientists

I like R a lot, so should I work towards being better at just that or should I learn excel and python and sas as well (Like a jack of all master of none)?

I like R so much I wrote two books on it. Then I started writing a book on Python and now I am on writers block.

  1. You need to be good at many things (Python, R, SAS, Excel, SQL)
  2. You need to be really really good at one thing ( I prefer Python, but R or SAS could do. SAS people work in large corporations a lot, R people are more statistically driven, Python people are more Silicon Valley /IT driven. I would go with Python)
  3. You should know how data is stored ( in RDBMS and in NoSQL)
  4. You should know how data is processed (cloud computing, server)
  5. You should know how data is visualized ( GGPLOT, Qlikview, Tableau)
  6. You have a limited time to learn all this and again many choices. So try going for more education and more training!

Is it a thumb rule to know advanced analytics with Excel before actually aiming at R?

There are 8 fingers but two thumbs. Thumb rules are shortcuts. They save time (for instructor to explain). Yes I would learn how to analyze data in a spreadsheet too since a lot of employers use spreadsheets. Spreadsheets ( not juts Excel but OpenOffice and Google Docs) are used more than data science tools in analyzing data. Basic principles remain the same.

If I choose to begin with a job so I can get a feel of the industry and get to know it better they ask for all these tools with?

Of course industry wants people with 2 year work experience. Why should they pay you to learn? So learn skills before you expect jobs.

I found out that internships, kaggle competitions, certifications,case studies could help, but not yet in India is what I’ve come to see.. Is that true sir? Am I judging it wrong?

Wrong judging. Judging itself is wrong. Stop judging 1.2 billion people and millions of sq km.

I know many Meetups in India ( I founded one in New Delhi). Kaggle compeitors are great in Mumbai. Bangalore is great for coneferences. So yes, you are comparing to USA ( not fair, its better here because people just are more disciplined in organizing). if you compare to Pakistan we are better.

Best is to stop asking for help, and just go out and attend. Maybe create a Meetup group yourself ( takes only 10$ a month but again someone has to pay it!). Maybe create a desi meetup. Maybe create a Meetup only for Women. or only for New Comers. Or Ask your online education provider. Stop judging countries. Start volunteering. People like are doing a great job for hackthons in India. Meetups in Bangalore

ask not what your country can do for you — ask what you can do for your country. more meetups in data science. more hackathons. yup.  that is good enough work to start doing for your country.