Not just a Cloud

While browsing the rather content heavy site of Oracle, I came across this interesting white paper on cloud computing.

Platform-as-a-Service Private Cloud with Oracle Fusion Middleware

at http://www.oracle.com/us/technologies/036500.pdf

It basically says that Oracle has the following offerings for PaaS-

  • Application grid
  • Oracle SOA Suite and Oracle Business Process Management Suite
  • Oracle WebCenter Suite
  • Oracle Identity Management

Here is why traditional software licensing model can be threatened by Cloud Computing. These are very basic and conservative costs. If you have a software budget you can run the numbers yourself.

Suppose you pay $10,000 for an annual license and say an extra $5,000 for hardware costs for it.Assume you are using in house resources (employees) which cost you another $50,000/year.

The per hour cost of this very basic resource is Total Cost/ Number of hours utilized.

Assuming a 100 % utilization at work hours ( which is not possible) but still .

That’s a 40 hour week * 48 weeks ( including holidays).

or 33.85 $ per hour.

That’s the cut off point for you deciding to offshore work to contractors or outsourcing.

Assuming say a more realistic 80% utilization the per hour cost is= $42.31/hour.

Now assume we cant outsource because of data hygiene or some reason- so we take the same people costs/ exclude them and calculate only the total cost of ownership ( software and hardware).

thats $15,000 per 0.8 per 40*48 hours.

That’s still an astonishing 9.76 $ per hour.

Compare this cost with the cost of running a virtual instance of R on an Amazon Ec2.

Eg. http://biocep-distrib.r-forge.r-project.org/

or using http://www.zementis.com (which is now introducing an Excel add in as well at http://www.zementis.com/Excel-Ai.htm)

The per hour costs are not going to be more than 3.5 $ per hour. Thats much much better than ANY stats software licensed today on ANY desktop /Server configuration.

See the math. Thats why cloud is much more than time sharing, Dr G 😉

First of all, I don’t see anything greatly new and wonderful and different about cloud computing. It was timesharing way back in ’60. It’s not a whole lot different. I certainly have issues asking a bank to send us all their data and we’re going to put it up on a cloud. They’re going to say, ‘What about security? How will I know who else is up there in that cloud?’ I don’t know, it’s just a cloud.-

Dr Jim Goodnight, SAS Institute.

 

Oracle Open World and Techie Events

An innovate way to showcase collateral thanks to Oracle Open World.

The Post Event has an easily searchable archive with downloadable files and partner collateral.

See this-

http://myexpospace.com/oracle2009/vcr2009/vcr.cfm?search=advance

untitled

An even better site is their streaming site post event-( which is better than an HTML website —isnt it)

http://ondemandpreview.vportal.net/

oracle

However the website-http://myexpospace.com/ which has enabled Oracle to do this Post Event Content Management seems to be in a closed beta (as they update their websites)

Post event content management helps in getting out the word to people who were unable to see it, and the analytics from site visitor behavior can help you gauge viewer interest. I personally also like the concept of Virtual Conferences  as well – as they can be done much more frequently than they are currently are-

Like the Cloud Slam event I was invited to speak earlier ( and missed because of time zone differences !)

https://decisionstats.wordpress.com/2009/04/10/cloud-nine/

and https://decisionstats.wordpress.com/2009/03/12/here-comes-the-cloud-slam/

SAS Data Mining 2009 Las Vegas

I am going to Las Vegas as a guest of SAS Institute for the Data Mining 2009 Conference. ( Note FCC regulations on bloggers come in effective December but my current policies are in ADVERTISE page unchanged since some months now)

With the big heavyweight of analytics, SAS Institute showcases events in both the SAS Global Forum and the Data Mining 2009

conference has a virtual who’s- who of partners there. This includes my friends at Aster Data and Shawn Rogers, Beye Network

in addition to Anne Milley, Senior Product Director. Anne is a frequent speaker for SAS Institute and has shrug off the beginning of the year NY Times spat with R /Open Source. True to their word they did go ahead and launch the SAS/IML with the interface to R – mindful of GPL as well as open source sentiments.

. While SPSS does have a data mining product there is considerable discussion on that help list today on what direction IBM will allow the data mining product to evolve.

Charlie Berger, from Oracle Data Mining , also announced at Oracle World that he is going to launch a GUI based data mining product for free ( or probably Software as a Service Model)- Thanks to Karl Rexer from Rexer Analytics for this tip.

While this is my first trip to Las Vegas ( a change from cold TN weather), I hope to read new stuff on data mining including sessions on blog and text mining and statistical usage of the same. Data Mining continues to be an enduring passion for me even though I need to get maybe a Divine Miracle for my Phd to get funded on that topic.

Also I may have some tweets at #M2009 for you and some video interviews/ photos. Ok- Watch this space.

ps _ We lost to Alabama #2 in the country by two points because 2 punts were blocked by hand which were as close as it gets.

Next week I hope to watch the South Carolina match in Orange Country.

Screenshot-32

Oracle BIWA Presentation: Analytics Talk

Deadline EXTENDED to Friday, Oct. 23 to Submit Your

BIWA Presentation at COLLABORATE 2010

Submit a presentation BI/DW & Analytics – Brain-Powered by the BIWA SIG The IOUG is excited to partner with the BIWA SIG to present a special “conference within a conference” – Get Analytical with BIWA Training Days — on Business Intelligence and Data Warehousing, held in conjunction with COLLABORATE 10 – IOUG Forum. Those interested in Analytics, BI, Data Warehousing, Enterprise Performance Management (EPM) and OBIEE are encouraged to participate in this special forum.  For full track descriptions click here.

Don’t miss your chance to attend COLLABORATE 10, April 18-22 in Las Vegas, Nevada, for free!*Submit a presentation** through the IOUG Forum by this FRIDAY, October 23 for the chance to:

  • SHARE your Oracle Business Intelligence, Warehousing and Analytics knowledge with your peers.
  • ENHANCE your own knowledge through the teaching of others.
  • ENJOY the recognition that comes with being a Business Intelligence, Warehousing and Analytics speaker.

So what are you waiting for? Submit your Business Intelligence, Warehousing and Analytics presentation before the deadline — FRIDAY, October 23.

*Technical session and Deep Dive speakers receive complimentary registration to the full conference; and Quick Tip speakers receive 50% off the early bird registration rate.

**All Oracle employees interested in speaking at COLLABORATE 10 should contact Lisa Stuart at lisa.stuart@oracle.com. Please do not submit papers through the official COLLABORATE 10 call for speakers until approval is received.

***Register with code BIWA2010 for discounts and BI-nefits


Interview Shawn Kung Sr Director Aster Data

Here is an interview with Shawn Kung, Senior Director of Product Management at Aster Data. Shawn explains the difference between the various database technologies, Aster’s rising appeal to its unique technological approach and touches upon topics of various other interests as well to people in the BI and technology space.

image001

Ajay -Describe your career journey from a high school student of science till today .Do you think science is a more lucrative career?

Shawn: My career journey has spanned over a decade in several Silicon Valley technology companies.  In both high school and my college studies at Princeton, I had a fervent interest in math and quantitative economics.  Silicon Valley drew me to companies like upstart procurement software maker Ariba and database giant Oracle.  I continued my studies by returning to get a Master’s in Management Science at Stanford before going on to lead core storage systems for nearly 5 years at NetApp and subsequently Aster.

Science (whether it is math, physics, economics, or the hard engineering sciences) provides a solid foundation.  It teaches you to think and test your assumptions – those are valuable skills that can lead to a both a financially lucrative and personally inspiring career.

Ajay- How would you describe the difference between Map Reduce and Hadoop and Oracle and SAS, DBMS and Teradata and Aster Data products to a class of undergraduate engineers ?

Shawn: Let’s start with the database guys – Oracle and Teradata.  They focus on structured data – data that has a logical schema and is manipulated via a standards-based structured query language (SQL).  Oracle tries to be everything to everyone – it does OLTP (low-latency transactions like credit card or stock trade execution apps) and some data warehousing (typically summary reporting).  Oracle’s data warehouse is not known for large-scale data warehousing and is more often used for back-office reporting.

Teradata is focused on data warehousing and scales very well, but is extremely expensive – it runs on high-end custom hardware and takes a mainframe approach to data processing.  This approach makes less sense as commodity hardware becomes more compute-rich and better software comes along to support large-scale MPP data warehousing.

SAS is very different – it’s not a relational database. It really offers an application platform for data analysis, specifically data mining.  Unlike Oracle and Teradata which is used by SQL developers and managed by DBAs, SAS is typically run in business units by data analysts – for example a quantitative marketing analyst, a statistician/mathematician, or a savvy engineer with a data mining/math background.  SAS is used to try to find patterns, understand behaviors, and offer predictive analytics that enable businesses to identify trends and make smarter decisions than their competitors.

Hadoop offers an open-source framework for large-scale data processing.  MapReduce is a component of Hadoop, which also contains multiple other modules including a distributed filesystem (HDFS).  MapReduce offers a programming paradigm for distributed computing (a parallel data flow processing framework).

Both Hadoop and MapReduce are catered toward the application developer or programmer.  It’s not catered for enterprise data centers or IT.  If you have a finite project in a line of business and want to get it done, Hadoop offers a low-cost way to do this.  For example, if you want to do large-scale data munging like aggregations, transformations, manipulations of unstructured data – Hadoop offers a solution for this without compromising on the performance of your main data warehouse.  Once the data munging is finished, the post-processed data set can be loaded into a database for interactive analysis or analytics. It is a great combination of big data technologies for certain use-cases.

Aster takes a very unique approach.  Our Aster nCluster software offers the best of all worlds – we offer the potential for deep analytics of SAS, the low-cost scalability and parallel processing of Hadoop/MapReduce, and the structured data advantages (schema, SQL, ACID compliance and transactional integrity, indexes, etc) of a relational database like Teradata and Oracle.  Often, we find complementary approaches and therefore view SAS and Hadoop/MapReduce as synergistic to a complete solution.  Data warehouses like Teradata and Oracle tend to be more competitive.

Ajay- What exciting products have you launched so far and what makes them unique both from a technical developer perspective and a business owner perspective

Shawn: Aster was the first-to-market to offer In-Database MapReduce, which provides the standards and familiarity of SQL and databases with the analytic power of MapReduce.  This is very unique as it offers technical developers and application programmers to write embedded procedural algorithms once, upload it, and allow business analysts or IT folks (SQL developers, DBAs, etc) to invoke these SQL-MapReduce functions forever.

It is highly polymorphic (re-usable), highly fault-tolerant, highly flexible (any language – Java, Python, Ruby, Perl, R statistical language, C# in the .NET world, etc) and natively massively parallel – all of which differentiate these SQL extensions from traditional dumb user-defined functions (UDFs).

Ajay- “I am happy with my databases and I don’t need too much diversity or experimentation in my systems”, says a CEO to you.

How do you convince him using quantitative numbers and not marketing adjectives?

Shawn: Aster has dozens of production customers including big-names like MySpace, LinkedIn, Akamai, Full Tilt Poker, comScore, and several yet-to-be-named retail and financial service accounts.  We have quantified proof points that show orders of magnitude improvements in scalability, performance, and analytic insights compared to incumbent or competitor solutions.  Our highly referenceable customers would be happy to discuss their positive experiences with the CEO.

But taking a step back, there’s a fundamental concept that this CEO needs to first understand.  The world is changing – data growth is proliferating due to the digitization of so many applications and the emergence of unstructured data and new data types.  Like the book “Competing on Analytics”, the world is shifting to a paradigm where companies that don’t take risks and push the limits on analytics will die like the dinosaurs.

IDC is projecting 10x+ growth in data over the next few years to zetabytes of aggregate data driven by digitization (Internet, digital television, RFID, etc).  The data is there and in order to compete effectively and understand your customers more intimately, you need a large-scale analytics solution like the one Aster nCluster offers.  If you hold off on experimentation and innovation, it will be too late by the time you realize you have a problem at hand.

Ajay- How important is work life balance for you?

Shawn: Very important.  I hang out with my wife most weekends – we do a lot of outdoors activities like hiking and gardening.  In Silicon Valley, it’s all too easy to get caught up in the rush of things.  Taking breaks, especially during the weekend, is important to recharge and re-energize to be as productive as possible.

Ajay- Are you looking for college interns and new hires what makes aster exciting for you so you are pumped up every day to go to work?

Shawn: We’re always looking for smart, innovative, and entrepreneurial new college grads and interns, especially on the technical side.  So if you are a computer science major or recent grad or graduate student, feel free to contact us for opportunities.

What makes Aster exciting is 2 things –

first, the people.  Everyone is very smart and innovative so you learn a tremendous amount, which is personally gratifying and professionally useful long-term.

Second, Aster is changing the world!

Distributed systems computing focused on big data processing and analytics – these are massive game-changers that will fundamentally change the landscape in data warehousing and analytics.  Traditional databases have been a oligopoly for over a generation – they haven’t been challenged and so the 1970’s based technology has stuck around.  The emergence of big data and low-cost commodity hardware has created a unique opportunity to carve out a brand new market…

what gets me pumped every day is I have the ability to contribute to a pioneer that is quickly becoming Silicon Valley’s next great success story!

Biography-

Over the past decade, Shawn has led product management for some of Silicon Valley’s most successful and innovative technology companies.  Most recently, he spent nearly 5 years at Network Appliance leading Core Systems storage product management, where he oversaw the development of high availability software and Storage Systems hardware products that grew in annual revenue from $200M to nearly $800M.  Prior to NetApp, Shawn held senior product management and corporate strategy roles at Oracle Corporation and Ariba Inc.

Shawn holds an M.S. in Management Science and engineering from Stanford University, where he was awarded the Valentine Fellowship (endowed by Don Valentine of Sequoia Capital).  He also received a B.A. with high honors from Princeton University.

About Aster

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware. The AsternCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis.

Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget. Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton and Ron Conway.

Oracle Open World

 Interesting to see Oracle creating easy technology for .NET techies to switch over. Cross platform interfaces are the flavor of the season.

.NET at Oracle Develop
Join the Oracle Develop conference at Oracle OpenWorld (October 11-15, 2009, San Francisco). It will again feature a .NET developer track that includes comprehensive coverage of Oracle’s .NET technologies and will be presented by the Oracle team that develops many of the key features. Oracle Develop is perfect for all levels of .NET developers, from beginner to advanced. It covers introductory Oracle .NET material, new .NET features for Oracle database, deep dive content, and a hands-on lab.
To register, go to Oracle Develop registration site. Seats are filling up fast for Oracle Develop .NET sessions, so use Schedule Builder and reserve a seat for yourself today!

Develop Sessions
Getting Started with Oracle and .NET
ASP.NET Web Development with Oracle Databases
PL/SQL Programming for .NET Developers: Tips, Tricks, and Debugging
Database Development Lifecycle Management with Visual Studio: SQL, PL/SQL and .NET Stored Procedure Development, Source Control, and Deployment
Optimize Oracle Data Access Performance with Microsoft Visual Studio and .NET
Messaging and Event-Driven .NET Applications with Oracle Database
Main OpenWorld Session
Allstate Insurance’s Mission-Critical Use of Oracle .NET Technologies
Hands-on Lab
Building .NET Applications with Oracle: Part 1
Building .NET Applications with Oracle: Part 2
Building .NET Applications with Oracle: Part 3

Premier Developer Conference for Oracle Technologists
Don’t miss Oracle Develop, the premier developer program, at Oracle OpenWorld 2009! World-leading experts will be on hand leading sessions about the latest development trends and technologies for service-oriented architecture (SOA), Extreme Transaction Processing (XTP), virtualization, and Web 2.0. Advance your skills and expand your knowledge in scores of expert-led, in-depth technical sessions and advanced how-tos on Java, .NET, XML, SCA, PL/SQL, Ajax, PHP, Groovy on Rails, and more. And roll up your sleeves for in-depth, hands-on labs covering the very latest development technologies including database, SOA, Complex Event Processing (CEP), Java, and .NET.

When: Sunday, October 11, 9:00 a.m. to 4:45 p.m.
Monday, October 12, 10:15 a.m. to 6:30 p.m.
Tuesday, October 13, 11:30 a.m. to 6:30 p.m.
Where: Hilton San Francisco
Oracle Develop Schedule at a Glance
Oracle Develop Keynotes
Oracle Develop Tracks

 

How to use Oracle for Data Mining

Oracle for Data Mining!!!! Thats right I am talking of the same Database company that made waves with acquiring Sun ( and the beloved Java) and has been stealing market share left and right.

Here are some techie specific help- if you know SQL ( or Even Proc SQL) you can learn Oracle Data Mining in less than an hour- good enough to clear that job shortlist.

Check out the attached sample code examples.  They are designed to run on the ODM demo data, but you could change that easily.  They are posted on OTN here

Sample Code Demonstrating Oracle 11.1 Data Mining (230KB)
These files include sample programs in PL/SQL and Java illustrating each of the algorithms supported by Oracle Data Mining 11.1. There are examples of automatic data preparation and data transformations appropriate for each algorithm. Several programs illustrate the text transformation and text mining process.

Oracle Data Mining PL/SQL Sample Programs

The PL/SQL sample programs illustrate each algorithm supported by Oracle Data Mining as well as text transformation and text mining using NMF and SVM classification. Transformations that prepare the data for mining are included in the programs.Execute the PL/SQL sample programs.

Mining Function Algorithm Sample Program
Anomaly Detection One-Class Support Vector Machine dmsvodem.sql
Association Rules Apriori dmardemo.sql
Attribute Importance Minimum Descriptor Length dmaidemo.sql
Classification Adaptive Bayes Network (deprecated) dmabdemo.sql
Classification Decision Tree dmdtdemo.sql
Classification Decision Tree (cross validation) dmdtxvlddemo.sql
Classification Logistic Regression dmglcdem.sql
Classification Naive Bayes dmnbdemo.sql
Classification Support Vector Machine dmsvcdem.sql
Clustering k-Means dmkmdemo.sql
Clustering O-Cluster dmocdemo.sql
Feature Extraction Non-Negative Matrix Factorization dmnmdemo.sql
Regression Linear Regression dmglrdem.sql
Regression Support Vector Machine dmsvrdem.sql
Text Mining Text transformation using Oracle Text dmtxtfe.sql
Text Mining Non-Negative Matrix Factorization dmtxtnmf.sql
Text Mining Support Vector Machine (Classification) dmtxtsvm.sql

And

a particularly cute and nifty example of Fraud ( as in Fraud Detection 😉

drop table CLAIMS_SET;
exec dbms_data_mining.drop_model(‘CLAIMSMODEL’);
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values (‘ALGO_NAME’,’ALGO_SUPPORT_VECTOR_MACHINES’);
insert into CLAIMS_SET values (‘PREP_AUTO’,’ON’);
commit;
begin
dbms_data_mining.create_model(‘CLAIMSMODEL’, ‘CLASSIFICATION’,
‘CLAIMS’, ‘POLICYNUMBER’, null, ‘CLAIMS_SET’);
end;
/
— accuracy (per-class and overall)
col actual format a6
select actual, round(corr*100/total,2) percent, corr, total-corr incorr, total from
(select actual, sum(decode(actual,predicted,1,0)) corr, count(*) total from
(select CLAIMS actual, prediction(CLAIMSMODEL using *) predicted
from CLAIMS_APPLY)
group by rollup(actual));
— top 5 most suspicious claims where the number of previous claims is 2 or more:
select * from
(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, ‘0’ using *) prob_fraud
from CLAIMS_APPLY
where PASTNUMBEROFCLAIMS in (‘2 to 4’, ‘more than 4’)
where rnk <= 5
order by percent_fraud desc;

Coming up- a series of tutorials on learning the skills by just sitting in your home.

Hat Tip- Karl Rexer , Rexer Analytics and Charlie Berger, Oracle.