biwa – DECISION STATS

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application

Scripting of “production” data mining methodologies

Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

Here is an interview with Charlie Berger, Oracle Data Mining Product Management. Oracle is a company much respected for its ability to handle and manage data, and with it’s recent acquisition of Sun- has now considerable software and financial muscle to take the world of data mining to the next generation.

Ajay- Describe your career in data mining so far from college, jobs, assignments and projects. How would you convince high school students to take up science careers?

Charlie- In my family, we were all encouraged to pursue science and technical fields. My Dad was a Mechanical Engineer and all my siblings are in scientific and medical fields. Early on, I had narrowed my career choices to engineering or medicine; the question when I left for college was which kind. My Freshman Engineering exposed students to 6 weeks of the curriculum for each of the engineering disciplines. I found myself drawn to the field of Operations Research and Industrial Engineering. I liked the applied math and problem solving aspects. While not everyone has an aptitude or an interest in Math or the Sciences, if you do, it can be a fascinating field.

Ajay- Please tell us some technical stuff about Oracle Data Mining and Oracle Data Miner products. How do they compare with other products notably from SAS and SPSS? What is unique in Oracle’s suite of data mining products- and some market share numbers to back these please?

Charlie- Oracle doesn’t share product level revenue numbers. I can say that Oracle is changing the analytics industry. Ten years ago, when Oracle acquired the assets of Thinking Machines, we shared a vision that over time, as the volumes of data expand, at some point, you reach a point where you have to ask whether it makes more sense to “move the data to the algorithms” or to “move the algorithms to the data”. Obviously, you can see the direction that Oracle pursued. Now after 10 years of investing in in-database analytics, we have 50+ statistical techniques and 12 machine learning algorithms running natively inside the kernel of the Oracle Database. Essentially, we have transformed the database to become an analytical database. Today, you now see the traditional statistical software vendors announcing partnering initiatives for in-database processing or in the case of IBM, acquiring SPSS. Oracle pioneered the concept of using a relational database to not only store data, but to analyze it too. Moving forward, I think that we are close to the tipping point where in-database analytics are accepted as the winning IT architecture.

This trend towards moving the analytics to where the data are stored makes a lot of sense for many reasons. First, you don’t have to move the data. You don’t have to have copies of the data in external analytical sandboxes where it open to security risks and over time, becomes more aged and irrelevant.

I know of one major e-tailor who constantly experiments by randomly showing web visitors either offers “A” or a new experimental offer “B”. They would export massive amounts of data to SAS afterwards to perform simple statistical analyses. First, they would calculate the median purchase amounts for the duration of the experiment for customers who were shown both offers. Then, they would perform a t-test hypothesis test to determine whether a statistically valid monetary advantage could be gained. If offer “B” were outperforming offer “A”, the e-tailor would Continue reading “Interview Charlie Berger Oracle Data Mining”