Here is an interview with Charlie Berger, Oracle Data Mining Product Management. Oracle is a company much respected for its ability to handle and manage data, and with it’s recent acquisition of Sun- has now considerable software and financial muscle to take the world of data mining to the next generation.
Ajay- Describe your career in data mining so far from college, jobs, assignments and projects. How would you convince high school students to take up science careers?
Charlie- In my family, we were all encouraged to pursue science and technical fields. My Dad was a Mechanical Engineer and all my siblings are in scientific and medical fields. Early on, I had narrowed my career choices to engineering or medicine; the question when I left for college was which kind. My Freshman Engineering exposed students to 6 weeks of the curriculum for each of the engineering disciplines. I found myself drawn to the field of Operations Research and Industrial Engineering. I liked the applied math and problem solving aspects. While not everyone has an aptitude or an interest in Math or the Sciences, if you do, it can be a fascinating field.
Ajay- Please tell us some technical stuff about Oracle Data Mining and Oracle Data Miner products. How do they compare with other products notably from SAS and SPSS? What is unique in Oracle’s suite of data mining products- and some market share numbers to back these please?
Charlie- Oracle doesn’t share product level revenue numbers. I can say that Oracle is changing the analytics industry. Ten years ago, when Oracle acquired the assets of Thinking Machines, we shared a vision that over time, as the volumes of data expand, at some point, you reach a point where you have to ask whether it makes more sense to “move the data to the algorithms” or to “move the algorithms to the data”. Obviously, you can see the direction that Oracle pursued. Now after 10 years of investing in in-database analytics, we have 50+ statistical techniques and 12 machine learning algorithms running natively inside the kernel of the Oracle Database. Essentially, we have transformed the database to become an analytical database. Today, you now see the traditional statistical software vendors announcing partnering initiatives for in-database processing or in the case of IBM, acquiring SPSS. Oracle pioneered the concept of using a relational database to not only store data, but to analyze it too. Moving forward, I think that we are close to the tipping point where in-database analytics are accepted as the winning IT architecture.
This trend towards moving the analytics to where the data are stored makes a lot of sense for many reasons. First, you don’t have to move the data. You don’t have to have copies of the data in external analytical sandboxes where it open to security risks and over time, becomes more aged and irrelevant.
I know of one major e-tailor who constantly experiments by randomly showing web visitors either offers “A” or a new experimental offer “B”. They would export massive amounts of data to SAS afterwards to perform simple statistical analyses. First, they would calculate the median purchase amounts for the duration of the experiment for customers who were shown both offers. Then, they would perform a t-test hypothesis test to determine whether a statistically valid monetary advantage could be gained. If offer “B” were outperforming offer “A”, the e-tailor would convert the web store to show everyone the “better” offer.
Over the past 10 years, we’ve added 50+ of the most commonly used statistical functions as base features of every Oracle Database. Now “A/B offer” experimentation can be performed 100 percent inside the Oracle Database.
Ajay- What areas, or data conditions would you not recommend Oracle Data Miner to a customer?
Charlie- If you don’t already have any database expertise in your organization, have modest amounts of data or you are doing a simple “one-off” analysis, Oracle Data Mining may be too much horsepower for your problem. However, if you were college student learning technical skills, I would learn how to do data analysis in the database. That is where everyone will be doing it in the future.
Ajay- With the slew of Oracle Applications, will we be seeing any data mining appearing in these applications?
Charlie- Yes. You are already seeing it. Oracle announced the CRM OnDemand Sales Prospector application that integrates Oracle Data Mining as a recommendation engine inside the software designed for sales reps. As we already have the data and we know in what context the user wants to use that data, we’ve built automated data mining into Sales Prospector that identifies those prospects and which products have the highest likelihood of closing. We provide the expected length of time to close the deal, the size of the deal, and even offer likely references that should be helpful in closing the sale.
At Oracle Open World (OOW), we have many tens of thousands of customers join us for a week in San Francisco that is filled with hundreds of technical talks that span a wide range of technical topics, product features, industries and user roles. We embed Oracle Data Mining in the Oracle Open World Schedule Builder and mine the abstract text and attendee profiles to find patterns. As Oracle Open World attendees plan their week’s schedule, we suggest recommendations for the Oracle Open World talks in which they are most likely to be interested.
You can imagine a wide range of applications within Oracle where cleverly factory installing a bit of predictive analytics could provide great value to a user, a manager, a decision maker or a customer. Imagine being able to have insight into which employees are most likely to voluntarily leave, which expense report submissions might be anomalous, which products customers are most likely to be interested in, insights how to make your IT platform perform better.
The problem in the past has been that each application is different and different vendors were involved with different aspects of any total solution. If you have the data and know the use cases, that makes it a lot easier. Now that Oracle is a major player in everything from the database, to middleware, to applications and industry verticals, it is logical to anticipate customer use cases and build solutions that leverage the data and predictive analytics—too add greater value. I am a data miner; everywhere I look I see opportunities to automatically the mine data, gain new insights, and distribute that new information to where and when it is most needed to put that information to competitive gain.
Ajay- Describe some work that Oracle has been doing in predictive analytics, text mining and forecasting algorithms.
Charlie- Oracle started on this journey over 10 years ago. Our first data mining algorithms were the Naïve Bayes classification and A Priori (market basket analysis) algorithms. We did this because it made the most sense at the time. The database is good at counting things, and these algorithms were based on “simple” conditional probabilities. Overtime, we extended the database to add new supporting analytical features and expanded our set of machine learning algorithms and capabilities to include clustering, decision trees, support vector machines, special cases for anomaly detection, and the ability to mine text or unstructured data. As the Oracle Data Mining algorithms run in the kernel of the database and can mine tables, views, they also take advantage of all the vast capabilities of the Oracle Database including Partitioning, Security, Text, Real Application Clusters and now Exadata, Oracle’s new offering that combines software and hardware to offer incredible performance and scalability. In all cases, Oracle Data Mining is just another core capability of that same Database so everything is kept simple for IT. Whether you are looking to mine a single table or build a sophisticated application that automatically mines star schema data form a variety of sources and deploys the models and results in real-time applications, it is all built on the same IT, albeit now more analytically capable, platform. In the just released Oracle Database 11g Release 2, Oracle Data Mining models are pushed to the Exadata storage layer for significant (several multiples to orders of magnitude) performance improvement during model scoring. Changing your data repository into an “analytical database,” to be a bit trite, changes everything.
Ajay- Oracle has a huge India research presence-how do you think it can help popularize data mining driven decision making in developing countries.
Charlie- There are a lot of smart people across this planet. Today, with every Oracle Database having the capability to analyze data in place, the possibilities for clever people to create new applications, new use cases, and services are endless. I am always amazed that my bank’s ATM asks me what language I want to use every time I insert by card. Instead, I expect it to say, “Hello Mr. Berger, it is Friday night at 7 pm, did you want to take $150 (again) to go out on the town?” Or on Monday morning at 5:15 am, I would expect it to say, “Would you like to take out $250?” (Because the ATM application has mined your data and “guesses” that you are on your way to a flight for a trip and need to take some cash). These and many other new use cases require data, domain knowledge, and IT and analytical skills. I recommend that anyone interesting in new opportunities to read the Oracle Database (and Oracle Data Mining option) documentation and start thinking about creative new ways to exploit these new in-database analytical capabilities. The possibilities are endless. If you’ve watched the blockbuster movie Minority Report where Tom Cruise plays the Washington, DC pre-crime chief, John Anderton, who supervises the investigators who rely on 3 scientifically engineered beings who can see murders before they happen, you know what I mean.
Ajay- Could you describe some of your work at BIWA and value proposition for people to join it?
Charlie- Sure. BIWA is a community of like mined professional who use Oracle data warehousing, business intelligence and analytical (BIWA) features of Oracle’s database-centric technology. The mission of BIWA (www.oraclebiwa.org) is to share “best practices” and novel and interesting use cases of Oracle’s BIWA technology. As BIWA is free to join and our mission is to share technical knowledge, we’ve grown to thousands of members. We’ve held two BIWA Summits and are now organizing our third BIWA Summit, which will be held next year in April as part of Collaborate 2010 at the Mandalay Bay Hotel in Las Vegas. We’ve started a BIWA Wednesday TechCast Series that is open to anyone to present who is willing to publicly share an interesting use case or his or her knowledge and expertise.
Ajay- For anyone interested in Oracle Data Mining, how would you recommend them to quickly get started?
Charlie- I encourage all prospective Oracle Data Mining users to review the content at the Oracle Data Mining site on the Oracle Technology Network (OTN). Aside from it coming up as the first result in any Google Search for “Oracle Data Mining”, they can reach the web site here:
http://www.oracle.com/technology/products/bi/odm/index.html Highlights include:
- Oracle Data Mining 11g Release 2 overview presentation
- Oracle Data Mining 11g Release 2 white paper
- Oracle Data Mining 11g Release 2 data sheet
- Algorithm technical summary with links to Documentation
- Recording of a recent BIWA TechCast with full presentation and several demos.
- Getting Started with Oracle Data Mining OTN page with instructions to download the optional and free Oracle Data Miner GUI, Step-by-Step Tutorial and demo datasets
- Oracle Data Mining Discussion Forum on OTN (great for posting questions and getting answers)
- Sample Code (examples of ODM SQL and Java APIs in several use cases; great for developers)
I hope this helps people get going putting the data they have to beneficial use. Good luck to all!
Ajay- What does Charlie Berger do when not creating complex math based algorithms?
Charlie- I love to fish and caught a nice sized fish yesterday. I love to play tennis and Frisbee. In the winter, I ski whenever I get a chance. Like most, I love to travel and see new places. I’ve had the opportunity to travel to a number of interesting places either for work or with my family for vacation. This summer, I was at the Carolina beaches boogie boarding in the waves and chillaxing reading a good book. I’m currently simultaneously reading Outliers by Malcolm Gladwell, Predictably Irrational, by Dan Ariely, and Cryptonomicon, by Neal Stephenson and Neuromancer, by William Gibson.
Charles R. Berger is senior director of product management, life sciences and data mining, at Oracle Corporation. You can follow his words here at http://twitter.com/CharlieDataMine