Interview Anup Purohit CIO YES BANK #Datathon #Datadriven #YESBANK #datascience #hackathons

Ajay- What is your take on the importance of being ‘data driven’?


Anup- At the expense of sounding cliched, we believe that Data is one of the most important assets we have, which doesn’t get reflected on our balance sheet. As a bank we are in the business of customer service, therefore our ability to provide a seamless experience to every customer depends on our ability to collect, store and analyze relevant data.

  • This has become even more important in the last 1-2 years with increased talk of providing ‘banking-as-a-service’, essentially integrating banking/financial service with every aspect of life, making the availability data management and analytics essential tools
  • An interesting point here is that now that we think of it, most industries especially banks have always had these data points and even at times used it – the catch though is that it was limited to customer onboarding. While KYC literally spells Know Your Customer this data was almost never used beyond compliance and risk analytics. This data combined with data collected during the lifecycle of the customer, and the additional computing ability available today is a real differentiator and hence being ‘data driven’ is a necessity and not a choice.

Ajay- Tell us a bit more about how the outlook on being data led or data driven has changed over time

Anup- For us, the start of this journey of really becoming data driven actually came at a very opportune moment. If you go back to 2003/04 when we started out building probably the country’s only greenfield bank, our focus was largely on the corporate segment while starting to build a retail franchisee.  Being the nth entrant in a highly competitive industry technology was always going to be a differentiator for us. Yet both the data technologies available and the data we were collecting, were limited in their advancement and size respectively. Change came about 4-5 years back on 3 fronts



    • First, we moved far more deeper into the retail segment with a greater push on building a granular retail bank, which meant that the volume of data and the sources of data increased manifold
    • Second, this was interestingly also the time when the so called Big Data technologies like distributed file systems, commodity servers and cloud compute really began to hit their stride
    • Third, the sources of collecting data almost tripled, and this is an understated aspect. For example, today we talk about voice analytics and the rise of Alexa/Siri among others, but given that most customer service centers had IVR meant that voice data was available even then. Similarly, Optical technologies like OCR also began to find sync then, therefore image data especially for signature verification etc. being to take up. With these sources the need for investing in compute was more than ever
  • This was like a trifecta bringing a rare situation where the demand and supply graphs for us rising at the same time, and made our decision to invest far more into data management, security and analytics a tad bit easier

Ajay- What specific business needs / opportunities led to your investment in Big Data technologies ?

Anup-As I said earlier, the 3 factors of rise in the quantum of data volumes and variety of sources coupled with the exponential growth in technology availability meant that the traditional database management technologies we were extensively using were nearing obsolescence.



    • However, we still faced an interesting dilemma, the newer distributed file systems, cognitive and cloud computing were being used at that time only by technology intensive industries. The financial services in particular was largely playing the ‘wait and watch’ game which to an extent made sense – the idea being to wait for the technology and people expertise in big data, to mature and then be fast followers.
    • While deliberating we started looking at our global peers some of whom were also our early investors and partners, and we realized that they had already taken the leap and were almost 4 years ahead of the curve. But among industry peers in India there were still no early/first movers. However among other industries some like e-commerce had moved beyond RDBMS and invested heavily into their machine learning capabilities.
    • Sensing an opportunity, we reached out to many of these organizations and tried to understand their stories and motivations. 2 clear learnings emerged


  • While it’s true that any technology takes time to develop, big data and ML systems and statistical learning was already fairly mature and the very nature of machine learning meant that the true value can be unlocked once the machine truly understands the nuances specific to industries and your customer set – offsetting the value of being ‘fast followers’


    • Also, in our cross industry discussions we clearly learned that all customer service industries are essentially similar, and the extent of success depended on a term that is very commonly used but rarely understood in banking – KYC – Know Your Customer – better you know them, more integrated and customized your services are
  • It was clear then that we needed to invest and ‘go big’ on  far nimble, scalable and flexible solution which lead us to Hadoop. Leveraging on commodity servers as opposed to specialized hardware results in quantum cost savings on the infrastructure alone which is required to analyze these large data sets
  • It also set the foundation for our 3 priorities in data management – data security which is of paramount importance, data management and data based decision-making. On these would want to stress a lot on Data Management, well sourced cleaned and managed data lays the foundation for any machine learning tool – and it’s essentially your data stacks which determine whether your machine learning software becomes Skynet or Flintstones


Ajay- What does your team look like and how do you interact with business ?


Anup- I’m a sports enthusiast, so on this let me give you a football analogy, especially since I hear a lot about the rising importance of stacking teams with data scientists.

  • In football, whatever formation you play essentially there are 3 parts, defense, midfield and attack. Our data management and analytics teams are similarly aligned, though it has little to do with our love for sports
  • The Data Management, Security, Sourcing team are our defenders, setting the rules of data security and sharing, making sure that we have a robust core from which we can build outwards
  • Then there is the midfield which sort of links defense and offense, the risk analytics team a 10-15 a centralized core team focusing on optimizing data extraction, standardization, recycling/feedback loops and learning
  • Our Offence – Business analytics is responsible for interacting with all functions of YES BANK to create innovative solutions that aids business success. This team is spread  both vertically and horizontally , with dedicated business intelligence teams for each function with a specialized analytics strategy team layer on top to collaborate with decentralized teams and identify valuable analytics content and promote it across the bank

Ajay- Why did YES Bank invest in Datathon ? How does it fit into your overall strategy ?



Anup- To keep pace with the fast changing landscape of data technology, it’s important to also have a broad ‘outside-in’ view through an ecosystem of leaders and learners who can guide us through our data native transformation.

  • YES Datathon is our initiative to crowd source ideas from a pool of talented data learners and practitioners. We received an overwhelming response from over 1700+ teams making our ecosystem 6000+ strong. We are currently in the last phase of Datathon where the Top 50 teams have access to curated and anonymized YES Bank data to select a problem statement of choice and create a PoC
  • While going through their submissions, I realized ideas need not always come from within office walls. It has been interesting to see how we can learn and adapt data practices from across industries to ensure our service continue to  remains the best in the industry.
  • While YES Datathon is a step to build an engaged community of  Data experts, engineers and scientists to maximize our data management and analytics practices– the long term approach is to build a team of experts within the bank Team TechTONic, a CORE team of 100 business & technology experts who will drive Digitization & digitalization of the Bank’s technology.


Anup Purohit has been the CIO of Yes Bank since 2015, and has over a 23-year long record of accomplishments in IT management across global and multi-cultural environments .


1) What prompted you to make

The concept of Hyreo took shape in our mind as an outcome of the recruiting challenges we faced on a daily basis. All aspects of recruiting are very human labor intense and predictability of outcome at each stage was quite limited. The amount of time spend in sourcing, validating and assessing candidates was very high and hence pretty expensive. The same challenges existed in companies of all sizes. Hyreo took shape in our mind as a possible solution to address some of the recruiting challenges we saw around. We are trying to leverage smart technology and automation to improve the way candidate sourcing, assessment and engagement is carried out. We also felt that the opportunity was quite large since globally the recruitment model and process is fairly standard with limited or minor changes. Availability of technologies including Open NLP and others also helped us decide on building Hyreo as a potential solution to these recruiting problems.

2) In your two year journey as an entrepreneur with  Hyreo, name some
learnings and some turning points.

A few learnings from our entrepreneurial journey:

  1. Customers are the most important factor impacting everything – employees, investors & partners
  2. Partner as much as possible than build everything in-house and create ‘win-win’ for all parties
  3. Be prepared for rejection, it is unavoidable
  4. Hire slow but fire fast
  5. Entrepreneur knows more about the product than investor, customer or media
  6. Marketing is more important than one might think. Place it early in the lifecycle and use it effectively
  7. Create evangelists and supporters of the cause early in the game, but never on equity

3) Specifically which need is trying to address and solve

Hyreo is disrupting the way companies ‘Discover’ and ‘Engage’ with talent. Hyreo leverages smart technology to automate the process of job information dissemination to prospect candidates, understand their interest level and subject proficiency and keep the candidates engaged and up-to date on the latest status 24/7. Build as a SaaS solution with chatbot technology, the platform is able to integrate with legacy systems or exist as a stand-alone system. Hyreo is built in a modular fashion such that customers can choose the product based on specific needs. By using the platform, companies are able to reduce 50% overall effort in recruiting and 40% overall cost with substantial improvement in candidate experience and hence talent brand.

4) What are some of the other innovations you see in the HR space

All aspect of HR and human capital management areas is getting disrupted with legacy processes being challenged by newer technology including Machine learning and AI based systems. Some of the areas that we see interesting innovation and proved merit include:

  1. Employee engagement: Be it answering employee queries or addressing issues of the employees, innovative technology solutions including Chatbots are being deployed
  2. Candidate reference checks are being automated to ensure the cycle time and the overall effort is reduced considerably
  3. Digital Learning platforms including micro learning platforms
  4. Intelligent interviewing platforms


5) What are some of the obstacles you see to HR innovations.

The journey has just begun and the initial inertia opposing the change has drastically reduced. There is a lot of exciting new technology in the market now, and it will take time for all stakeholders to evaluate options and adopt best practices. Some of the areas we should look at:

  1. HR should be a CEOs function and there should be focus on not just improving process but the mindset should be to invest in success
  2. There is a need for re-branding HR as a growth catalyst rather than a growth support function
  3. Need more investments in HR Tech space

About Hyreo-


Missing Value Imputation and Dealing With Outliers

Missing Value Imputation and Dealing With Outliers

These are an important part of data pre-processing and these are rarely taught in DONKEY ACADEMY who charge you a lot to give you a certificate that doesn’t give you a job.

So okay after that violence and double talk (from Dire Straits) here is how you deal with outliers

1) Replace outliers or missing values them with mean or median – based on distribution -which you see if age< 20 or age>80 then age=median(age)

2) Replace them by capping upper and lower limits. eg an age distribution of 1-120 for bank customers can be capped like if age<20 then age=20 if age>80 then age=80

3) Use MICE package for Imputation (in R) or pandas-mice for Python ( eg if males have median age of 50 and females have median age 0f 45, replace all male age missing values with 50 and all female missing values with 45

4) Use OutlierTest in car package in R This is barely the tip of iceberg in missing value and outliers

#machinelearning hashtag#algorithms hashtag#pythonprogramminglanguage hashtag#analytics hashtag#datascience hashtag#python hashtag#rstats

Is Kaggle too tough

Is KAGGLE a website only for super human data scientists? NO NO NO

You can be a kaggler very easily-

1) Understand how kernels function especially input file and output submission- The best is to use Notebook method not script method of using code

2) Have basic knowledge of EDA and Data Viz in either R or Python ( if you dont know that EDA means exploratory data analysis you can start learning – from Kaggle KERNELS itself

3) Have basic knowledge of Machine Learning Algorithms (and how to apply ) and how to compare Area under Curve (AUC)

4) Deep Learning is advanced and for Python preferably

5) Practice one hour a day. Kaggle is like a gym for the brain if you do this for a year, see where your career zooms.

And one more thing- cross post your code on Github hashtag#bigdata hashtag#love hashtag#machinelearning hashtag#analytics hashtag#datascience hashtag#deeplearning hashtag#python hashtag#r hashtag#howto hashtag#github hashtag#datamining hashtag#datavisualization