charlie berger – DECISION STATS

R Oracle Data Mining

Here is a new package called R ODM and it is an interface to do Data Mining via Oracle Tables through R. You can read more here http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html and here http://cran.fhcrc.org/web/packages/RODM/RODM.pdf . Also there is a contest for creative use of R and ODM.

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application

Scripting of “production” data mining methodologies

Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

and

Present and win an Apple iPod Touch!
The BI, Warehousing and Analytics (BIWA) SIG is giving an Apple iPOD Touch to the best new presenter. Be part of the TechCast series and get a chance to win!

Consider highlighting a creative use of R and ODM.

BIWA invites all Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) to submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community in our Wednesday TechCast series. Note that the contest is limited to new presenters to encourage fresh participation by the BIWA community.

Also an interview with Oracle Data Mining head, Charlie Berger https://decisionstats.wordpress.com/2009/09/02/oracle/

SAS Data Mining 2009 Las Vegas

I am going to Las Vegas as a guest of SAS Institute for the Data Mining 2009 Conference. ( Note FCC regulations on bloggers come in effective December but my current policies are in ADVERTISE page unchanged since some months now)

With the big heavyweight of analytics, SAS Institute showcases events in both the SAS Global Forum and the Data Mining 2009

conference has a virtual who’s- who of partners there. This includes my friends at Aster Data and Shawn Rogers, Beye Network

in addition to Anne Milley, Senior Product Director. Anne is a frequent speaker for SAS Institute and has shrug off the beginning of the year NY Times spat with R /Open Source. True to their word they did go ahead and launch the SAS/IML with the interface to R – mindful of GPL as well as open source sentiments.

. While SPSS does have a data mining product there is considerable discussion on that help list today on what direction IBM will allow the data mining product to evolve.

Charlie Berger, from Oracle Data Mining , also announced at Oracle World that he is going to launch a GUI based data mining product for free ( or probably Software as a Service Model)- Thanks to Karl Rexer from Rexer Analytics for this tip.

While this is my first trip to Las Vegas ( a change from cold TN weather), I hope to read new stuff on data mining including sessions on blog and text mining and statistical usage of the same. Data Mining continues to be an enduring passion for me even though I need to get maybe a Divine Miracle for my Phd to get funded on that topic.

Also I may have some tweets at #M2009 for you and some video interviews/ photos. Ok- Watch this space.

ps _ We lost to Alabama #2 in the country by two points because 2 punts were blocked by hand which were as close as it gets.

Next week I hope to watch the South Carolina match in Orange Country.

How to use Oracle for Data Mining

Oracle for Data Mining!!!! Thats right I am talking of the same Database company that made waves with acquiring Sun ( and the beloved Java) and has been stealing market share left and right.

Here are some techie specific help- if you know SQL ( or Even Proc SQL) you can learn Oracle Data Mining in less than an hour- good enough to clear that job shortlist.

Check out the attached sample code examples. They are designed to run on the ODM demo data, but you could change that easily. They are posted on OTN here

Sample Code Demonstrating Oracle 11.1 Data Mining (230KB)
These files include sample programs in PL/SQL and Java illustrating each of the algorithms supported by Oracle Data Mining 11.1. There are examples of automatic data preparation and data transformations appropriate for each algorithm. Several programs illustrate the text transformation and text mining process.

Oracle Data Mining PL/SQL Sample Programs

The PL/SQL sample programs illustrate each algorithm supported by Oracle Data Mining as well as text transformation and text mining using NMF and SVM classification. Transformations that prepare the data for mining are included in the programs.Execute the PL/SQL sample programs.

Mining Function Algorithm Sample Program

Anomaly Detection One-Class Support Vector Machine dmsvodem.sql

Association Rules Apriori dmardemo.sql

Attribute Importance Minimum Descriptor Length dmaidemo.sql

Classification Adaptive Bayes Network (deprecated) dmabdemo.sql

Classification Decision Tree dmdtdemo.sql

Classification Decision Tree (cross validation) dmdtxvlddemo.sql

Classification Logistic Regression dmglcdem.sql

Classification Naive Bayes dmnbdemo.sql

Classification Support Vector Machine dmsvcdem.sql

Clustering k-Means dmkmdemo.sql

Clustering O-Cluster dmocdemo.sql

Feature Extraction Non-Negative Matrix Factorization dmnmdemo.sql

Regression Linear Regression dmglrdem.sql

Regression Support Vector Machine dmsvrdem.sql

Text Mining Text transformation using Oracle Text dmtxtfe.sql

Text Mining Non-Negative Matrix Factorization dmtxtnmf.sql

Text Mining Support Vector Machine (Classification) dmtxtsvm.sql

Mining Function	Algorithm	Sample Program
Anomaly Detection	One-Class Support Vector Machine	`dmsvodem.sql`
Association Rules	Apriori	`dmardemo.sql`
Attribute Importance	Minimum Descriptor Length	`dmaidemo.sql`
Classification	Adaptive Bayes Network (deprecated)	`dmabdemo.sql`
Classification	Decision Tree	`dmdtdemo.sql`
Classification	Decision Tree (cross validation)	`dmdtxvlddemo.sql`
Classification	Logistic Regression	`dmglcdem.sql`
Classification	Naive Bayes	`dmnbdemo.sql`
Classification	Support Vector Machine	`dmsvcdem.sql`
Clustering	k-Means	`dmkmdemo.sql`
Clustering	O-Cluster	`dmocdemo.sql`
Feature Extraction	Non-Negative Matrix Factorization	`dmnmdemo.sql`
Regression	Linear Regression	`dmglrdem.sql`
Regression	Support Vector Machine	`dmsvrdem.sql`
Text Mining	Text transformation using Oracle Text	`dmtxtfe.sql`
Text Mining	Non-Negative Matrix Factorization	`dmtxtnmf.sql`
Text Mining	Support Vector Machine (Classification)	`dmtxtsvm.sql`

And

a particularly cute and nifty example of Fraud ( as in Fraud Detection 😉

drop table CLAIMS_SET;

exec dbms_data_mining.drop_model(‘CLAIMSMODEL’);

create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));

insert into CLAIMS_SET values (‘ALGO_NAME’,’ALGO_SUPPORT_VECTOR_MACHINES’);

insert into CLAIMS_SET values (‘PREP_AUTO’,’ON’);

commit;

begin

dbms_data_mining.create_model(‘CLAIMSMODEL’, ‘CLASSIFICATION’,

‘CLAIMS’, ‘POLICYNUMBER’, null, ‘CLAIMS_SET’);

end;

/

— accuracy (per-class and overall)

col actual format a6

select actual, round(corr*100/total,2) percent, corr, total-corr incorr, total from

(select actual, sum(decode(actual,predicted,1,0)) corr, count(*) total from

(select CLAIMS actual, prediction(CLAIMSMODEL using *) predicted

from CLAIMS_APPLY)

group by rollup(actual));

— top 5 most suspicious claims where the number of previous claims is 2 or more:

select * from

(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,

rank() over (order by prob_fraud desc) rnk from

(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, ‘0’ using *) prob_fraud

from CLAIMS_APPLY

where PASTNUMBEROFCLAIMS in (‘2 to 4’, ‘more than 4’)

where rnk <= 5

order by percent_fraud desc;

Coming up- a series of tutorials on learning the skills by just sitting in your home.

Hat Tip- Karl Rexer , Rexer Analytics and Charlie Berger, Oracle.

Interview Charlie Berger Oracle Data Mining

Here is an interview with Charlie Berger, Oracle Data Mining Product Management. Oracle is a company much respected for its ability to handle and manage data, and with it’s recent acquisition of Sun- has now considerable software and financial muscle to take the world of data mining to the next generation.

Ajay- Describe your career in data mining so far from college, jobs, assignments and projects. How would you convince high school students to take up science careers?

Charlie- In my family, we were all encouraged to pursue science and technical fields. My Dad was a Mechanical Engineer and all my siblings are in scientific and medical fields. Early on, I had narrowed my career choices to engineering or medicine; the question when I left for college was which kind. My Freshman Engineering exposed students to 6 weeks of the curriculum for each of the engineering disciplines. I found myself drawn to the field of Operations Research and Industrial Engineering. I liked the applied math and problem solving aspects. While not everyone has an aptitude or an interest in Math or the Sciences, if you do, it can be a fascinating field.

Ajay- Please tell us some technical stuff about Oracle Data Mining and Oracle Data Miner products. How do they compare with other products notably from SAS and SPSS? What is unique in Oracle’s suite of data mining products- and some market share numbers to back these please?

Charlie- Oracle doesn’t share product level revenue numbers. I can say that Oracle is changing the analytics industry. Ten years ago, when Oracle acquired the assets of Thinking Machines, we shared a vision that over time, as the volumes of data expand, at some point, you reach a point where you have to ask whether it makes more sense to “move the data to the algorithms” or to “move the algorithms to the data”. Obviously, you can see the direction that Oracle pursued. Now after 10 years of investing in in-database analytics, we have 50+ statistical techniques and 12 machine learning algorithms running natively inside the kernel of the Oracle Database. Essentially, we have transformed the database to become an analytical database. Today, you now see the traditional statistical software vendors announcing partnering initiatives for in-database processing or in the case of IBM, acquiring SPSS. Oracle pioneered the concept of using a relational database to not only store data, but to analyze it too. Moving forward, I think that we are close to the tipping point where in-database analytics are accepted as the winning IT architecture.

This trend towards moving the analytics to where the data are stored makes a lot of sense for many reasons. First, you don’t have to move the data. You don’t have to have copies of the data in external analytical sandboxes where it open to security risks and over time, becomes more aged and irrelevant.

I know of one major e-tailor who constantly experiments by randomly showing web visitors either offers “A” or a new experimental offer “B”. They would export massive amounts of data to SAS afterwards to perform simple statistical analyses. First, they would calculate the median purchase amounts for the duration of the experiment for customers who were shown both offers. Then, they would perform a t-test hypothesis test to determine whether a statistically valid monetary advantage could be gained. If offer “B” were outperforming offer “A”, the e-tailor would Continue reading “Interview Charlie Berger Oracle Data Mining”

R Interface to Oracle Data Mining

Please share:

Please share:

Please share:

Please share: