2B | ! 2B
Sometime back I started a series of articles advocating
- free and open learning in data science,
- more practical and hands on practise in the same, as well as
- the gaps in current education industry practises.
I now feel the following-
- content should always be free as an OPTION in a structured learning path,
- any teaching or mentoring or any time commitment involved from instructor should be paid (because many people cant learn on their own),
- training institute should have at least a part of fees dependent on jobs created for student and
- books remain a cheap, effective and under utilized part of the data science training curriculum
- Less Anger and Hate
- More Cardio and Weights
- Lower Medication by the plan
- Higher Fitness by elan
- Finer writing
- Lesser fighting
- Better Code
- More travel on the road
- Plan, Implement and Execute
- Save the money rather than splurge the loot
Ajay- What is your take on the importance of being ‘data driven’?
Anup- At the expense of sounding cliched, we believe that Data is one of the most important assets we have, which doesn’t get reflected on our balance sheet. As a bank we are in the business of customer service, therefore our ability to provide a seamless experience to every customer depends on our ability to collect, store and analyze relevant data.
- This has become even more important in the last 1-2 years with increased talk of providing ‘banking-as-a-service’, essentially integrating banking/financial service with every aspect of life, making the availability data management and analytics essential tools
- An interesting point here is that now that we think of it, most industries especially banks have always had these data points and even at times used it – the catch though is that it was limited to customer onboarding. While KYC literally spells Know Your Customer this data was almost never used beyond compliance and risk analytics. This data combined with data collected during the lifecycle of the customer, and the additional computing ability available today is a real differentiator and hence being ‘data driven’ is a necessity and not a choice.
Ajay- Tell us a bit more about how the outlook on being data led or data driven has changed over time
Anup- For us, the start of this journey of really becoming data driven actually came at a very opportune moment. If you go back to 2003/04 when we started out building probably the country’s only greenfield bank, our focus was largely on the corporate segment while starting to build a retail franchisee. Being the nth entrant in a highly competitive industry technology was always going to be a differentiator for us. Yet both the data technologies available and the data we were collecting, were limited in their advancement and size respectively. Change came about 4-5 years back on 3 fronts
- First, we moved far more deeper into the retail segment with a greater push on building a granular retail bank, which meant that the volume of data and the sources of data increased manifold
- Second, this was interestingly also the time when the so called Big Data technologies like distributed file systems, commodity servers and cloud compute really began to hit their stride
- Third, the sources of collecting data almost tripled, and this is an understated aspect. For example, today we talk about voice analytics and the rise of Alexa/Siri among others, but given that most customer service centers had IVR meant that voice data was available even then. Similarly, Optical technologies like OCR also began to find sync then, therefore image data especially for signature verification etc. being to take up. With these sources the need for investing in compute was more than ever
- This was like a trifecta bringing a rare situation where the demand and supply graphs for us rising at the same time, and made our decision to invest far more into data management, security and analytics a tad bit easier
Ajay- What specific business needs / opportunities led to your investment in Big Data technologies ?
Anup-As I said earlier, the 3 factors of rise in the quantum of data volumes and variety of sources coupled with the exponential growth in technology availability meant that the traditional database management technologies we were extensively using were nearing obsolescence.
- However, we still faced an interesting dilemma, the newer distributed file systems, cognitive and cloud computing were being used at that time only by technology intensive industries. The financial services in particular was largely playing the ‘wait and watch’ game which to an extent made sense – the idea being to wait for the technology and people expertise in big data, to mature and then be fast followers.
- While deliberating we started looking at our global peers some of whom were also our early investors and partners, and we realized that they had already taken the leap and were almost 4 years ahead of the curve. But among industry peers in India there were still no early/first movers. However among other industries some like e-commerce had moved beyond RDBMS and invested heavily into their machine learning capabilities.
- Sensing an opportunity, we reached out to many of these organizations and tried to understand their stories and motivations. 2 clear learnings emerged
- While it’s true that any technology takes time to develop, big data and ML systems and statistical learning was already fairly mature and the very nature of machine learning meant that the true value can be unlocked once the machine truly understands the nuances specific to industries and your customer set – offsetting the value of being ‘fast followers’
- Also, in our cross industry discussions we clearly learned that all customer service industries are essentially similar, and the extent of success depended on a term that is very commonly used but rarely understood in banking – KYC – Know Your Customer – better you know them, more integrated and customized your services are
- It was clear then that we needed to invest and ‘go big’ on far nimble, scalable and flexible solution which lead us to Hadoop. Leveraging on commodity servers as opposed to specialized hardware results in quantum cost savings on the infrastructure alone which is required to analyze these large data sets
- It also set the foundation for our 3 priorities in data management – data security which is of paramount importance, data management and data based decision-making. On these would want to stress a lot on Data Management, well sourced cleaned and managed data lays the foundation for any machine learning tool – and it’s essentially your data stacks which determine whether your machine learning software becomes Skynet or Flintstones
Ajay- What does your team look like and how do you interact with business ?
Anup- I’m a sports enthusiast, so on this let me give you a football analogy, especially since I hear a lot about the rising importance of stacking teams with data scientists.
- In football, whatever formation you play essentially there are 3 parts, defense, midfield and attack. Our data management and analytics teams are similarly aligned, though it has little to do with our love for sports
- The Data Management, Security, Sourcing team are our defenders, setting the rules of data security and sharing, making sure that we have a robust core from which we can build outwards
- Then there is the midfield which sort of links defense and offense, the risk analytics team a 10-15 a centralized core team focusing on optimizing data extraction, standardization, recycling/feedback loops and learning
- Our Offence – Business analytics is responsible for interacting with all functions of YES BANK to create innovative solutions that aids business success. This team is spread both vertically and horizontally , with dedicated business intelligence teams for each function with a specialized analytics strategy team layer on top to collaborate with decentralized teams and identify valuable analytics content and promote it across the bank
Ajay- Why did YES Bank invest in Datathon ? How does it fit into your overall strategy ?
Anup- To keep pace with the fast changing landscape of data technology, it’s important to also have a broad ‘outside-in’ view through an ecosystem of leaders and learners who can guide us through our data native transformation.
- YES Datathon is our initiative to crowd source ideas from a pool of talented data learners and practitioners. We received an overwhelming response from over 1700+ teams making our ecosystem 6000+ strong. We are currently in the last phase of Datathon where the Top 50 teams have access to curated and anonymized YES Bank data to select a problem statement of choice and create a PoC
- While going through their submissions, I realized ideas need not always come from within office walls. It has been interesting to see how we can learn and adapt data practices from across industries to ensure our service continue to remains the best in the industry.
- While YES Datathon is a step to build an engaged community of Data experts, engineers and scientists to maximize our data management and analytics practices– the long term approach is to build a team of experts within the bank Team TechTONic, a CORE team of 100 business & technology experts who will drive Digitization & digitalization of the Bank’s technology.
Anup Purohit has been the CIO of Yes Bank since 2015, and has over a 23-year long record of accomplishments in IT management across global and multi-cultural environments .