Data Science Training can be inexpensive and free

IS it TOUGH to be a DATA SCIENTIST? NO , it is not

Data Science is Not Rocket Science. But once a data scientist you have to keep learning every day.

Master R and Python basics along with statistics basics.

Then learn Machine Learning.

  • Text Mining Basic and Topic Modeling.
  • Time Series.

  • Then learn Deep Learning, ANN, CNN, RNN , LSTM.

  • Computer Vision.

  • Speech Recognition.

  • Chatbots.

  • Blockchain.

you can learn this from internet for free. Dont get confused or insecure to pay lacs of rupees or thousands of dollars to institutes that give you certificates that are not recognized by corporations


Intro to R

Intro to Python

Intro to Machine Learning

Here is one more free “kernel”, but in colab format:–essential-machine-learning-and-exploratory-data-analysis-with-python-and-jupyter-notebook

Is KAGGLE a website only for super human data scientists? NO NO NO

You can be a kaggler very easily-

1) Understand how kernels function especially input file and output submission- The best is to use Notebook method not script method of using code

2) Have basic knowledge of EDA and Data Viz in either R or Python ( if you dont know that EDA means exploratory data analysis you can start learning – from Kaggle KERNELS itself

3) Have basic knowledge of Machine Learning Algorithms (and how to apply ) and how to compare Area under Curve (AUC)

4) Deep Learning is advanced and for Python preferably

5) Practice one hour a day. Kaggle is like a gym for the brain if you do this for a year, see where your career zooms.

And one more thing- cross port your code on Github

I am sure there are better kernels, but you can find them out yourself, and best of all they are free. tip- Number of votes often points out to a better more popular kernel


R Basics Here and

basic statistics

and free SAS learning from SAS itself 

Interview Questions

Python basics here and


(Free )Kaggle kernel + IBM Cognitive + edx + Kaggle contest + hackathon > certificate from paid private company ???

40 hours to gain a certificate for X dollars versus 40 hours on Kaggle for free. Which will give you better skills. What will get you a job – skills or certificates. 

When we interview data scientist freshers we always have  a coding round as the first step. Certificates from private institutes dont matter regardless of how long or how expensive they are

I have been asked why I write these articles on free resources on data science, what is my agenda and why not let things be.

Well, short answer, if you charge thousands of dollars for content which can be free, and force young people in debt and indulge in predatory pricing, then someone needs to expose these merchants of data science certificates

Someone asked why I charge for my 3 data science books. I write books, publisher sells them and gives me 13% of royalty.  The books are 1/10th of price of a course.

Most importantly I write books for academic credentials and because I love writing (as seen by my extensive blogging on (writing books is a great way to share knowledge in my opinion but takes a long time so writing a blog tutorial or kaggle kernel or github code is faster  

I still do guest lectures- but in all cases I am not responsible for students paying too much and I balance this by my evangelizing free resources that would be students are completely unaware of.

These free resources are often updated more than the curriculum of courses by institutes and they are often easy to understand

As the man said- Money for nothing and my MTV


1) What prompted you to make

The concept of Hyreo took shape in our mind as an outcome of the recruiting challenges we faced on a daily basis. All aspects of recruiting are very human labor intense and predictability of outcome at each stage was quite limited. The amount of time spend in sourcing, validating and assessing candidates was very high and hence pretty expensive. The same challenges existed in companies of all sizes. Hyreo took shape in our mind as a possible solution to address some of the recruiting challenges we saw around. We are trying to leverage smart technology and automation to improve the way candidate sourcing, assessment and engagement is carried out. We also felt that the opportunity was quite large since globally the recruitment model and process is fairly standard with limited or minor changes. Availability of technologies including Open NLP and others also helped us decide on building Hyreo as a potential solution to these recruiting problems.

2) In your two year journey as an entrepreneur with  Hyreo, name some
learnings and some turning points.

A few learnings from our entrepreneurial journey:

  1. Customers are the most important factor impacting everything – employees, investors & partners
  2. Partner as much as possible than build everything in-house and create ‘win-win’ for all parties
  3. Be prepared for rejection, it is unavoidable
  4. Hire slow but fire fast
  5. Entrepreneur knows more about the product than investor, customer or media
  6. Marketing is more important than one might think. Place it early in the lifecycle and use it effectively
  7. Create evangelists and supporters of the cause early in the game, but never on equity

3) Specifically which need is trying to address and solve

Hyreo is disrupting the way companies ‘Discover’ and ‘Engage’ with talent. Hyreo leverages smart technology to automate the process of job information dissemination to prospect candidates, understand their interest level and subject proficiency and keep the candidates engaged and up-to date on the latest status 24/7. Build as a SaaS solution with chatbot technology, the platform is able to integrate with legacy systems or exist as a stand-alone system. Hyreo is built in a modular fashion such that customers can choose the product based on specific needs. By using the platform, companies are able to reduce 50% overall effort in recruiting and 40% overall cost with substantial improvement in candidate experience and hence talent brand.

4) What are some of the other innovations you see in the HR space

All aspect of HR and human capital management areas is getting disrupted with legacy processes being challenged by newer technology including Machine learning and AI based systems. Some of the areas that we see interesting innovation and proved merit include:

  1. Employee engagement: Be it answering employee queries or addressing issues of the employees, innovative technology solutions including Chatbots are being deployed
  2. Candidate reference checks are being automated to ensure the cycle time and the overall effort is reduced considerably
  3. Digital Learning platforms including micro learning platforms
  4. Intelligent interviewing platforms


5) What are some of the obstacles you see to HR innovations.

The journey has just begun and the initial inertia opposing the change has drastically reduced. There is a lot of exciting new technology in the market now, and it will take time for all stakeholders to evaluate options and adopt best practices. Some of the areas we should look at:

  1. HR should be a CEOs function and there should be focus on not just improving process but the mindset should be to invest in success
  2. There is a need for re-branding HR as a growth catalyst rather than a growth support function
  3. Need more investments in HR Tech space

About Hyreo-


Missing Value Imputation and Dealing With Outliers

Missing Value Imputation and Dealing With Outliers

These are an important part of data pre-processing and these are rarely taught in DONKEY ACADEMY who charge you a lot to give you a certificate that doesn’t give you a job.

So okay after that violence and double talk (from Dire Straits) here is how you deal with outliers

1) Replace outliers or missing values them with mean or median – based on distribution -which you see if age< 20 or age>80 then age=median(age)

2) Replace them by capping upper and lower limits. eg an age distribution of 1-120 for bank customers can be capped like if age<20 then age=20 if age>80 then age=80

3) Use MICE package for Imputation (in R) or pandas-mice for Python ( eg if males have median age of 50 and females have median age 0f 45, replace all male age missing values with 50 and all female missing values with 45

4) Use OutlierTest in car package in R This is barely the tip of iceberg in missing value and outliers

#machinelearning hashtag#algorithms hashtag#pythonprogramminglanguage hashtag#analytics hashtag#datascience hashtag#python hashtag#rstats

Freshers who want to be data scientists

Questions asked by young data scientists-Hey Ajay this is XYZ.

I want to learn data science and pursue my career in .currently I am fresher.

Can you tell me how to enter in company with data science profile what should i include in profile to get intern or job .

I want advice from you .

Ajay –

hashtag#machinelearning hashtag#bigdata hashtag#deeplearning hashtag#python

Is Kaggle too tough

Is KAGGLE a website only for super human data scientists? NO NO NO

You can be a kaggler very easily-

1) Understand how kernels function especially input file and output submission- The best is to use Notebook method not script method of using code

2) Have basic knowledge of EDA and Data Viz in either R or Python ( if you dont know that EDA means exploratory data analysis you can start learning – from Kaggle KERNELS itself

3) Have basic knowledge of Machine Learning Algorithms (and how to apply ) and how to compare Area under Curve (AUC)

4) Deep Learning is advanced and for Python preferably

5) Practice one hour a day. Kaggle is like a gym for the brain if you do this for a year, see where your career zooms.

And one more thing- cross post your code on Github hashtag#bigdata hashtag#love hashtag#machinelearning hashtag#analytics hashtag#datascience hashtag#deeplearning hashtag#python hashtag#r hashtag#howto hashtag#github hashtag#datamining hashtag#datavisualization

Interview Bank Bazaar

Here is an interview with one of the most successful Indian startups in fintech. Adhil Shetty CEO Bank Bazaar speaks candidly.

Q1 We last interviewed Adhil Shetty, founder and CEO of Bankbazaar in 2008. Those were early days. Since then what milestones have you crossed in terms of customer fulfillment and others

The last ten years were a fantastic time for BankBazaar. We moved completely to the B2B business and are today India’s first neutral online marketplace that offers end-to-end instant services across leading financial institutions of India covering loans, credit cards, and insurance products. Supported by global investors such as Walden International, Sequoia Capital, Fidelity Growth Partners, Mousse Partners, Experian, and Amazon,’s goal is to deliver a marketplace that can help users access the right financial product and provide them a simpler, smoother, end-to- end experience in their financial journey.

With its focus on harnessing mobile technology to deliver paperless transactions, BankBazaar aims to be the leading marketplace for financial products. The company offers the largest number of financial products from more than 85 partner organizations over a highly secure, user-friendly, and intuitive platform. The partner organizations include the biggest nationalized and private banks, NBFCs, and insurance companies in India, providing a never-before range of financial products and services.

BankBazaar conceptualized India’s first and leading world-class digital marketplace. With changing times, online financial services have come to their own and there is a lot of demand from customers for a more holistic range of services. There is also an aggressive push from the government in this direction. BankBazaar has championed presence-less, paperless, cashless initiatives that will go a long way in democratizing personal finance and bring banking to the large unbanked population of India. We have been actively involved in developing an infrastructure ecosystem similar to the India Stacks framework that can provide online verification and consent system.

Introduced last year, the proprietary BankBazaar Paperless Stack is one of the biggest innovation that makes the experience of purchasing financial products synonymous with online shopping for the first time ever. The proprietary BankBazaar Paperless Stack is the world’s first multi-brand, paperless e-KYC platform for instant loan approval. The Bankbazaar Paperless Stack eliminates the need for physical document submission for loan approvals through the company’s online platform, and customers can now opt to retrieve and submit their documentation online for authentication and KYC purposes. This infrastructure stack, developed completely in-house, brings down the processing time substantially from as much as one week to one business day.  

With its technology automation, BankBazaar can deliver more than 20% savings in cost of customer acquisition for financial institutions. Simultaneously, they enjoy features that remove redundancy, on-ground team and cumbersome paper based verification with features like auto-submission of applications, and e-KYC document verification. This reduces the cost of selling a financial product without the financial institutions having to invest heavily in the underlying technology.

Today, BankBazaar sees 100M customers per quarter, of which more than 60% opt for the paperless route. Currently, more than 75% of the traffic is organic. Paperless has also contributed significantly to conversion rate, with 3X conversions in 1/3 time.

BankBazaar’s modular, scalable technology has helped them extend the scope of services rapidly across products and partners, not just in India but in other Asian countries as well. Apart from India, also has offices in Singapore and has commenced operations there and in Malaysia this year.

Q2 What do you think makes BankBazaar stand out in the world of online fintech

At BankBazaar we recognise the individuality of each of our customers and their unique needs. Our biggest USP is that we have personally experienced the difficulties our customers face while accessing the right financial product in the larger offline personal finance ecosystem regardless of whether it is a loan, credit card or mutual fund product. We enable financial brands to deliver financial products instantly over our platform, thanks to our robust, secure, and scalable technology, so that our customers can access the right financial product seamlessly.

Today, we offer the largest number of financial products in the market as well as the largest number of partner organizations including the biggest nationalized and private banks, NBFCs, and insurance companies. Currently we have partnered with more than 85 institutions to provide more than 100 distinct products.

BankBazaar provides an end-to-end financial services. We have a full-fledged application platform where people can search of offers, select the one that suits them the best, and then apply for it online. BankBazaar does not close the process with the application. We provide constant support and assistance at every stage of the application process all the way to the final disbursal.

Unlike our competitors, we do not follow a lead-based system where all customer details are passed on to banks as leads. Only once an application is submitted are the customer details are passed on the respective bank along with the application. We have strict privacy norms in place to make sure that customer data is not passed on to any third party, so there practically zero chances that our customers will be spammed.

Q3 What are your growth and expansion plans for India and/or other markets

Our main focus for FY19 is to maintain our aggressive growth. FY 2018 has been a year where we grew more than 100 percent in multiple categories, including insurance and mutual funds, and have seen more that 1M Experian credit score pulls per month. We are aiming to overshoot this performance in FY19 and are expecting to grow by more than 2X.

Our operating revenue, too, witnessed 90% growth in FY18 while our total costs grew by only 30% as compared to same time last year. Since Q3-18, we are seeing positive unit economics net of HR and Marketing. In FY19, we are expecting to be EBITDA positive. This is because we are a tech driven company with a long-term consumer-centric vision of PaperLess financial products accessed over the mobile and do not have high overhead expenses such as offline agent commissions.

Our core strategy continues to be very focused on bettering what we do. In FY19, we are looking to consolidate our presence in every category with a bigger variety of products from more number of partners so that the customers have the highest number of options ever to choose from.

We are also focused on bringing more and more paperless presenceless products from a larger than ever number of partners so that our customers can apply for the product that is right for them get an approval in a matter of minutes. There is a lot of R&D going on in the paperless, presenceless processes, and we are working on simplifying and speeding up the process even further.

On the international expansion front, we have plans to eventually expand into Philippines, UAE, Hong Kong, and Australia. Currently, we are working on narrowing down our options.

Q4 Do you use Machine Learning and AI in your products.

As Fintech company, we deeply care about providing the best shopping experience to our customers. One aspect of this is matching the product to the customer and providing the right mix of options every time the customer visits our site. We use analytics to find the sweet spot of our returning customers to target financial products accordingly, so that the ecosystem expands. This is an important area where analytics and machine learning made a difference.

One place where our analytics and technology succeeded was in the way it utilized non-financial data to bring financial inclusion especially in areas where penetration of the financial industry has been hampered due to difficult terrain or remoteness of the location. Our analytics has helped us reach out to tier-2 and tier-3 towns and border areas with products and services suited their typical demographics.

We are slowly moving towards the use of cognitive computing to improve product matches and profiling for a more personalized and improved customer experience. We are also working on ways to use data to enhance the financial health of our customers.

Q5 What makes a great place to work for current and future employees

The key to retaining employees is to make sure that they have enough incentives to stay with an organization. While the salary package is important to attract the right employees, retaining them is a slightly different ball game. At BankBazaar, we look for employees who are innovative and like to take challenges head-on. The biggest support we can provide them is to give them a work culture that encourages innovation and disruption. We have a flat organization that makes it possible for employees at all levels to seek out others across teams and hierarchies. We encourage open communication and make sure that employees know they are being heard.

At the same time, we make sure that the employees know where the company stands and the direction in which it is growing. This kind of open two-way communication builds employee confidence. It also eases their concerns about the direction their career is taking. To make this doubly sure, we try to provide a clear career growth path to our employees, which is closely aligned to the company’s growth. So, the employees know how their career will pan out over the years. This makes them confident about staying and growing with the organization.

We make sure that employees do have sufficient time to brush up on their skills. We have training plans and schedules in place that let employees select, plan, and undertake trainings and certifications relevant to their fields. This is a continual process and is highly encouraged. On one hand, it motivates and encourages employees. On the other, we end up with a highly motivated and skilled workforce.

We are a resource-intensive company, and prefer to build our leaders. So trainings help us identify and nurture these leaders as well. The opportunity to learn something new and apply it in your day-to-day work is something that all employees enjoy, and we try to give them this chance.

Above all, we make sure that individual contribution of our employees are recognized by their peers and the senior management through spot awards and other recognitions.

Data Science for Free

The following articles on LinkedIn gained almost 50000 views and 500 likes collectively

Data Science Education- Some people charge Rs 90000 rs for bootcamp. Some charge few hundred thousand rs for a diploma. Dont be an donkey to fall for these scams as they wont give you a job. Learning is free !! .

Everything you need to learn in data science is free online. Just be methodical and cover topics . SAS learning free is free on e-learning, SAS On Demand for Academics, and SAS University Edition

R and Python learning is free in websites like kaggle, kd nuggets, coursera, codeacademy, edx , datacamp,,, analyticsvidhya hee-haw hee-haw hashtag#machinelearning hashtag#datascience

if you sign up for free here you get 2 months of free

Dont be a DONKEY , Do homework before selecting paid study since huge material exists free already Some tricks DONKEY ACADEMY OF DATA SCIENCE uses to capture students into suckers giving money ignorant of facts

1) Tell about huge shortage of data scientists while not telling how many % students got a job within six months. Instead they tell you about a dozen companies which hired their students – amazing since they also claim thousands of students which is also inflated number)

2) Disguise costs (25000 for six weeks but 90000 for six months making you think the six months one is a bargain BUT IT is a trap)

3) Deceptive discounts- inflating price and giving either 10-15% discount or making some other course free

4) Using SEO and blog articles to give impression they do highly complicated work (they dont)

5) Cross selling other courses which are irrelevant (like selling IOT and Analytics together

6) Paying leading blogs (esp Indian) for ads and getting a genuine looking mention in interview, or list of top ten institutes or list of top ten data scientists for their instructor

7) Not telling things like git, data preprocessing, missing value imputation, feature selection on real life datasets instead of iris

From comments-

Most importantly they tell you that you don’t need mathematics at all or just a slight of mathematics to be the Data Scientist and also they are getting subsidies under NSDP without any or least regulation

one more deceptive practice is blogs offering paid content /interviews/ lists of rankings in return of money as ads on blog or other blog (conference/ hackathon /community pages)

I’ve seen all most all institutes(and a few e-learning sites) have good reviews/ratings on the net. Interestingly, the institutes reply to “negative reviews” claiming that the name is not present in their db. People even post fake quora answers.BUT! I believe there are good institutes with quality trainers, one just needs to go through genuine reviews.

If information asymmetry is the problem in data science education solution is making the free courses more widely known

Learn SAS

SAS Studio – SAS OnDemand

Learn R (note link for 2 months free datacamp above or at ) for reference

90 two minute videos on R

Many Courses by

Learn Python

see more courses at

when people smoke (behavior) and they know that smoking causes cancer (cognition), they are in a state of cognitive dissonance . Same is the case with people payng a tonne for courses and knowing the material is available free

so Dont be a donkey, be a race horse. Learn on free sites and test on kaggle, building up a profile on stack overflow and github.

Shun these DONKEY ACADEMY that charge you 90000 or 40000 for free content.

Donkeys carry load and are slow , Racehorses can do many things and are fast. Dear Student, Be a race horse