SAS Data Mining 2009 Las Vegas

I am going to Las Vegas as a guest of SAS Institute for the Data Mining 2009 Conference. ( Note FCC regulations on bloggers come in effective December but my current policies are in ADVERTISE page unchanged since some months now)

With the big heavyweight of analytics, SAS Institute showcases events in both the SAS Global Forum and the Data Mining 2009

conference has a virtual who’s- who of partners there. This includes my friends at Aster Data and Shawn Rogers, Beye Network

in addition to Anne Milley, Senior Product Director. Anne is a frequent speaker for SAS Institute and has shrug off the beginning of the year NY Times spat with R /Open Source. True to their word they did go ahead and launch the SAS/IML with the interface to R – mindful of GPL as well as open source sentiments.

. While SPSS does have a data mining product there is considerable discussion on that help list today on what direction IBM will allow the data mining product to evolve.

Charlie Berger, from Oracle Data Mining , also announced at Oracle World that he is going to launch a GUI based data mining product for free ( or probably Software as a Service Model)- Thanks to Karl Rexer from Rexer Analytics for this tip.

While this is my first trip to Las Vegas ( a change from cold TN weather), I hope to read new stuff on data mining including sessions on blog and text mining and statistical usage of the same. Data Mining continues to be an enduring passion for me even though I need to get maybe a Divine Miracle for my Phd to get funded on that topic.

Also I may have some tweets at #M2009 for you and some video interviews/ photos. Ok- Watch this space.

ps _ We lost to Alabama #2 in the country by two points because 2 punts were blocked by hand which were as close as it gets.

Next week I hope to watch the South Carolina match in Orange Country.

Screenshot-32

Buying SAS Institute

At risk of annoying a lot of friendly people, I am going to ask an old question and try and answer it quantitatively.

Who can buy SAS institute?

Graph from-http://www.sas.com/news/preleases/2008Financials.html

SAS_revenue_lores

As you can see from the graph (note the post 2001-2004 period) – which is a nice smoothed curve, textbook normal distribution on the left side, SAS Institute grew during the tough economic year of 2008 to show slowed but firm revenue growth. However if you use the same price/revenue multiple as for the SPSS acquisition ( 1.2 billion/ 300 million (2008) revenues) – that would put a price of 9.2 USD billion on SAS Institute.

Who has that kind of money? Well it seems the usual suspects are-

1) HP- from http://h30261.www3.hp.com/phoenix.zhtml?c=71087&p=irol-IRHome

and

Click to access HewlettPackard_2008_AR.pdf

Cash and cash equivalents on 12.851 Billion USD as on April 30, 2009.

2) Oracle- Oracle would be hard pressed to integrate both Sun and SAS in the same year, but may have financial leverage to do both.

from http://www.oracle.com/corporate/investor_relations/earnings/4q09-pressrelease-june.pdf

Fiscal year 2009
GAAP revenues were up 4% to $23.3 billion, while annual GAAP net income was up 1% to $5.6
billion.  Total GAAP new software license revenues for the year were down 5% to $7.1 billion.
GAAP software license updates and product support revenues were up 14% to $11.8 billion.
GAAP operating income was up 6% to $8.3 billion, and GAAP operating margins were up 80
basis points to 36% in fiscal year 2009.

3) IBM -from ftp://ftp.software.ibm.com/annualreport/2008/2008_ibm_financials.pdf

Cash on hand was 12.7 Billion USD as on 31 Dec 2008, and the company repurchased it’s own stock in 2008

In the current economic environment growth can come through acquisitions of newer clients ( not much) or new companies. IBM has capabilities to acquire BOTH SPSS and SAS Institute and merge the strong R and D facilities.

IBM 2008

4) SAP – from http://www.sap.com/germany/about/investor/reports/gb2008/en/our-results/finances.html

various sources of loan capital:

profit after income taxes for 2008 was slightly lower than for the previous year, we increased cash flows from operating activities 12% to € 2,158 million (2007: € 1,932 million) through efficient management of working capital.

  • To finance the acquisition of Business Objects, we entered into an agreement for a credit facility that was originally for € 5 billion and is repayable by December 31, 2009 (amount outstanding on December 31, 2008: € 2.3 billion). We did not draw the full € 5 billion available under the facility because we paid part of the purchase price from available cash.
  • To increase financial flexibility, in November 2004 we obtained a € 1 billion syndicated credit facility through an international group of banks. We already had other lines of credit in place; the new line was arranged to provide additional financial flexibility. As in the previous year, we did not draw on this facility during the year.
  • At the end of 2008, the other, bilateral lines of credit available to SAP AG totaled approximately € 597 million (2007: € 599 million). We did not draw on these facilities during 2008 or 2007. Several subsidiaries in the SAP Group had credit lines in their local currency. These totaled € 52 million (2007: € 44 million), for which SAP AG was guarantor. At the end of the year, the subsidiaries had drawn € 21 million under these facilities (2007: € 27 million).

Given these cash positions it seems that almost everyone can buy SAS Institute if and this is a big IF- someone sells it. Microsoft which some years allegedly tried and lost at acquiring Yahoo ( only to realize huge savings!) and SAS, would be also another suitor for SAS- and Google also has the financial and operating synergies with the best text mining capabilities could also act as a white knight in merging it’s Google Applications and Enterprise solutions ( especially the cloud based OS and cloud based productivity suite) with SAS Institute. I personally would favor a Google- SAS Institute joint venture on enterprise software solely based on the common history and shared values ( Note Google has dual ownership stock including class A and class B shares)

Who is John Galt ?

Another option could be using the Google Way and for SAS Institute to go for dual ownership IPO, with class A shares for the common public and class B shares for the founders and executives. A substantial endowment to colleges and universities can also be expected in the future, given the philanthropic tradition of SAS Institute owners and executives. Also could SAS try and buy SPSS- it would lead to synergies in both software ( with the SPSS GUI) as well as new clients. At the very minimum it would boost the valuation of other stock in this sector as well make SPSS more realistic valued.

So who will buy SAS Institute?

I don’ know 🙂 and I am just brushing off my half a decade old financial valuation skills here

What is the true value of SPSS

A brief study of the charts at http://tr.im/vDA4 ( CourtesyGoogle Finance) would suggest IBM is getting a bargain  for SPSS Inc.

And Oracle, Microsoft and other companies ( even the privately held SAS Institute) can do well to step in and take it away or at the very minimum make the valuation even more steep for IBM to hold on to.

SPSS reported in 2007 Total Revenue of $291million with a Net Income of $33.73million and in 2008 Total Revenue of $302.91million with a Net Income of $36.05million. Shares of SPSS Inc. (Public, NASDAQ:SPSS) increased from about $35 per share before the announcement to $49.50 per share after the announcement.

Citation-

http://shareholdersfoundation.com/caseinvestigation/spss-inc-takeover-subject-investor-investigations

SPSS.

Chart at http://tr.im/vDA4

Reactions to IBM -SPSS takeover.

The business intelligence -business analytics- data mining industry ( or as James Taylor would say Decision Management Industry) have some reactions on IBM – SPSS ( which was NOT a surprise to many including me). Really.

From SAS Institute, Anne Milley

http://blogs.sas.com/sascom/index.php?/archives/557-Analytics-is-still-our-middle-name.html

Besides SAS, SPSS was one of the last independent analytic software companies. A colleague says, “It’s the end of the analytics cold war.”

I’ve been saying all along that analytics is required for success. Yes, data integration, data quality, and query & reporting are important too but, as W. Edwards Deming says, “The object of taking data is to provide a basis for action.”

The end of the analytics cold war- hmm. We all know what the end of real cold war brought us- Google, Cloud Computing, and other non technical issues.

From KXEN, Roger Hadaad

“The price paid for SPSS of four times revenues and 25 times earnings shows just how valuable this sector really is,” says Haddad. “But the deal has also created a tremendous opportunity for the sector’s remaining independent vendors that

KXEN is well placed to capitalize on. “There is no For Sale sign hanging in our window,” continues Haddad. “We launched KXEN in 1998 to democratize the benefits of data mining and predictive analytics, making them practical and affordable across the whole enterprise and not just the exclusive preserve of a few specialists. It’s going to take up to two years for the dust to settle following the IBM

“Former SPSS partners, systems integrators and distributors will face uncertainty.”

I think the PE multiple was still low- SPSS was worth more if you count the client base, active community, brand itself in the valuation. Tremendous cross sell opportunities and IBM with it’s nice research and development is a good supporter of pure science.  Yes, next two years would be facing increasing consolidation and more “surprising” news. At 4 times earnings, anyone can be bought in the present market if it is a public listed company. 😉

From the rather subdued voices on SPSS list, some subjective and non quantitative ‘strategic” forecasts.

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0907&L=spssx-l&F=&S=&P=36324

I think the Ancient Chinese said it best “May you live in interesting times”.

Having worked with some flavors of Cognos and SPSS, I think there could be areas for technical integration for querying and GUI based forecasting as well, apart from financial mergers and administrative re adjustments. I mean people pull data not just to report it, but to estimate what comes next as well.

This could also spell the end of uni platform skilled analysts. You now need to learn atleast two different platforms like SAS,SPSS or KXEN, R or Cognos, Business Objects to hedge your chances of getting offshored (Note- I worked in offshoring for almost 4 years in India in data analytics).

Answering what IBM will do with SPSS and it’s open source commitment to R and consequences for employees, customers, vendors,partners who have more choices now than ever.

…. well it depends. Who is John Galt?

Interview:Richard Schultz , CEO REvolution Computing

Here is an interview with the CEO of REvolution Computing, Richard Schultz. Mr. Schultz offers his perspectives on aspects of the open source, predictive analytics, cloud computing as well his vision for R Commercial.

Note from Ajay-As I blogged previously, commercial establishments now have an option to use R commercially with a full service contract and all guarantees which they expect and get from existing analytics software vendors.

Ajay -Linux has not really succeeded in capturing Windows /Desktop Operating market. What are the technical and business reasons that you think R will succeed in analytics desktop software market.

Richard- To start, Linux was never really targeted at the Windows desktop market, but rather at deseating proprietary Unix deployments (particularly in finance), which it did quite successfully.  This is a similar trend to what we’re seeing in the R world – it’s not that R is generally replacing Excel, for instance.  In addition, with the large and growing base of both users and contributors, the vibrancy of the R community has taken on a life of its own.

As to R and Windows, two things are worth noting:

1. Microsoft has moved rapidly to embrace R and REvolution for that matter.

2. Windows is still the predominate operating system in large commercial enterprises. Because we deploy R on multiprocessors, which are now common on all computers including those pre-loaded with Windows, REvolution R is very much at home in both Windows, Mac, and Linux environments.

Ajay- What are the biggest challenges to Revolution Computing while explaining R Pro to users of traditional statistics softwares. What are the biggest advantages?

Richard- The biggest challenge is getting the word out that there now exists validated and supported R products designed for commercial use. But that’s changing rapidly, as your own interest in REvolution Computing demonstrates. Our biggest advantages are several:

1. we are focused on building a close and collegial relationship with the open source R community;

2. our company has a deep history in super computing and parallelization;

3. with, by Intel’s estimate, over 1 million R users and growing, there is a large community eager to adapt our products as its members advance their careers in the business and research worlds.

Ajay- Which softwares do you think will be affected the most by R’s spread across colleges and companies. What do you believe will be their strategies to compete.


Richard – I want to be politic here. Let me say that the programming software likely most affected by the rise of R is probably proprietary.

We see many opportunities to partner and leverage the strengths of REvolution’s products specifically – high performance, handling of large data, validation, IDE / user interface.

Ajay- How do you intend to incorporate the cloud computing and Software as a Service Model for R Pro. When , if at all, do you think it be possible  for a person to simply upload a zipped csv file, work on a remote cloud computer for analytics and forecasting, and just pay for the hired software,hardware,bandwidth.

Richard – We were thinking of something based on the Ohri framework.  ;-). ( Ajay- Touché!)

In fact, we have deployed, and are deploying cloud-based REvolution R for clients, and it’s something we expect to evolve as those technologies evolve.


Ajay- Asian countries have huge demand for analytics, and are more price conscious on softwares. What would your strategy to sell in Asia /China and India be.

Richard – Open source can be a tremendous win for users in Asia / China / India.  The upfront costs are low, the technology is leading-edge, and there is a distribution network for support.  REvolution has partners, and is continuing to build its partner network to be able to reach these markets.  We expect to accelerate our efforts in these regions toward the end of 2009.

Ajay- What has been the story so far for your career. What prompted you to join/start Revolution Computing. What would be the advice you would give to young science graduates in today’s recession.

Richard – My own background is in computer science, business… and music. Through school I held various positions at IBM, and after graduate school, I worked at Dunn & Bradsteet in a product management role and developed a taste for entrepreneurship. I’ve started two companies so far, MetaServer, a business intelligence middleware company that catered to the insurance industry, and REvolution Computing. Today, MetaServer is part of Oracle. And I continue to play music – guitar and piano. One of these days we’ll get a REvolution Computing band together.

My advice to young science graduates is the same recession or no: follow your enthusiasms; find a passion outside of work like playing music; master open source program languages because that is the future and the future is here.

About Richard Schultz –Chief Executive Officer,REvolution Computing

Richard guides REvolution’s long-range business strategy and leads the company’s teams on a daily basis. His experience developing and growing Business Intelligence software companies includes founding and leading Metaserver, Inc., now a part of Oracle, from inception to sale. Richard has been named Innovator of the Year by Business New Haven; served on the board of the Connecticut Venture Group; and been the keynote speaker for CIO Forum and other technology industry events.  A graduate of Washington University with degrees in Computer Science, Business and Music, Richard also holds a Masters degree in Computer Science from the State University of New York at Stonybrook and has held senior positions at Dunn and Bradstreet and IBM.

Ajay -REvolution Computing has been a leader in this field and going by the latest product launch –well you can try it yourself and see from here http://www.revolution-computing.com

Interview :Dr Graham Williams

(Updated with comments from Dr Graham in the comments section )


I have often talked about how the Graphical User Interface ,Rattle for R language makes learning R and building models quite simple. Rattle‘s latest version has been released and got extensive publicity including in KD Nuggets .I wrote to it’s creator Dr Graham, and he agreed for an extensive interview explaining data mining, its evolution and the philosophy and logic behind open source languages like R as well as Rattle.

Dr Graham Williams is the author of the Rattle data mining software and Adjunct Professor, University of Canberra and Australian National University.  Rattle is available from rattle.togaware.com.

Ajay Could you describe your career journey . What made you enter this field and what experiences helped shape your perspectives . What would your advice be to young professionals entering this field today.

Graham – With a PhD in Artificial Intelligence (topic: combining multiple decision trees to build ensembles) and a strong interest in practical applications, I started out in the late 1980’s developing expert systems for business and government, including bank loan assessment systems and bush fire prediction.

When data mining emerged as a discipline in the early 1990’s I was involved in setting up the first data mining team in Australia with the government research organization (CSIRO). In 2004 I joined the Australian Taxation Office and provide the technical lead for the deployment of its Analytics team, overseeing the development of a data
mining capability. I have been teaching data mining at the Australian National University (and elsewhere) since 1995 and continue to do so.

The business needs for Data Mining and Analytics continues to grow, although courses in Data Mining are still not so common. A data miner combines good backgrounds in Computer Science and Statistics. The Computer Science is too little emphasized, but is crucial for skills in developing repeatable procedures and good software engineering
practices, which I believe to be important in Data Mining.

Data Mining is more than just using a point and click graphical user interface (GUI). It is an experimental endeavor where we really need to be able to follow our nose as we explore through our data, and then capture the whole process in an automatically repeatable manner that can be readily communicated to others. A programming language offers this sophisticated level of communications.

Too often, I see analysts, when given a new dataset that updates last years data, essentially start from scratch with the data pre-processing, cleaning, and then mining, rather than beginning with last year’s captured processes and tuning to this year’s data.  The GUI generation of software often does not encourage repeatability.

Ajay -What made you get involved with R . What is the advantage of using Rattle
versus normal R.

Graham- I have used Clementine and SAS Enterprise miner over many years (and IBM’s original Intelligent Miner and Thinking Machines’ Darwin, and many other tools that emerged early on with Data Mining). Commercial vendors come and go (even large one’s like IBM, in terms of the products they support).

Lock-in is one problem with commercial tools. Another is that many vendors, understandably, won’t put resources into new algorithms until they are well accepted.
Because it is open source, R is robust, reliable, and provides access to the most advanced statistics. Many research Statisticians publish their new algorithms in R. But what is most important is that the source code is always going to be available. Not everyone has the skill to delve into that source code, but at least we have a chance to
do so. We also know that there is a team of highly qualified developers whose work is openly peer reviewed. I can monitor their coding changes, if I so wanted.  This helps ensure quality and integrity.

Rolling out R to a community of data analysts, though, does present challenges. Being primarily a language for statistics, we need to learn to speak that language. That is, we need to communicate with language rather than pictures (or GUI). It is, of course, easier to draw pictures, but pictures can be limiting. I believe a written language allows us to express and communicate ideas better and more formally. But it needs to be with the philosophy that we are communicating those ideas to our fellow humans, not just writing code to be executed by the computer.

Nonetheless, GUIs are great as memory aides, for doing simple tasks, and for learning how to perform particular tasks. Rattle aims to do the standard data mining steps, but to also expose everything that is done as R commands in the log. In fact, the log is designed to be able to be run as an R script, and to teach the user the R commands.

Ajay- What are the advantages of using Rattle  instead of SAS or SPSS. What are the disadvantages of using Rattle instead of SAS or SPSS.

Graham- Because it is free and open source, Rattle (and R) can be readily used in teaching data mining.  In business it is, initially, useful for people who want to experiment with data mining without the sometimes quite significant up front costs of the commercial offerings. For serious data mining, Rattle and R offers all of the data mining algorithms offered by the commercial vendors, but also many more. Rattle provides a simple, tab-based, user interface which is not as graphically sophisticated as Clementine in SPSS and SAS Enterprise Miner.

But with just 4 button clicks you will have built your first data mining model.

The usual disadvantage quoted for R (and so Rattle) is in the handling of large datasets – SAS and SPSS can handle datasets out of memory although they do slow down when doing so. R is memory based, so going to a 64bit platform is often necessary for the larger datasets. A very rough rule of thumb has been that the 2-3GB limit of the common 32bit processors can handle a dataset of up to about 50,000 rows with 100 columns (or 100,000 rows and 10 columns, etc), depending on the algorithms you deploy. I generally recommend, as quite a powerful yet inexpensive data mining machine, one running on an AMD64 processor, running the Debian GNU/Linux operating system, with as much memory as you can afford (e.g., 4GB to 32GB, although some machines today can go up to 128 GB, but memory gets expensive at that end of the scale).

Ajay – Rattle is free to download and use- yet it must have taken you some time
to build it.What are your revenue streams to support your time and efforts?

Graham –Yes, Rattle is free software: free for anyone to use, free to review the code, free to extend the code, free to use it for whatever purpose.  I have been developing Rattle for a few years now, with a number of
contributions from other users. Rattle, of course, gets its full power from R. The R community works together to help each other,
and others, for the benefit of all. Rattle and R can be the basic toolkit for knowledge workers providing analyses. I know of a number of data mining consultants around the world who are using Rattle to support their day-to-day consultancy work.

As a company, Togaware provides user support, installations of R and Rattle, runs training in using Rattle and in doing data mining. It also delivers data mining projects to clients. Togaware also provides support for incorporating Rattle (and R) into other products (e.g., as RStat for Information Builders).

Ajay – What is your vision of analytics for the future. How do you think the recession of 2008 and slowdown in 2009 will affect choice of softwares.

Graham- Watching the growth of data mining and analytics over the past 18 years it does seem that there has been and continues to be a monotonically increasing interest and demand for Analytics. Analytics continues to demonstrate benefit.

The global financial crisis, as others have suggested, should lead organizations to consider alternatives to expensive software. Good quality free and open source software has been available for a while now, but the typical CTO is still more comfortable purchasing expensive software. A purchase gives some sense of (false?) security but formally provides no warranty. My philosophy has been that we
should invest in our people, within an organization, and treat software as a commodity, that we openly contribute back into.

Imagine a world where we only use free open source software. The savings made by all will be substantial (consider OpenOffice versus MS/Office license fees paid by governments world wide, or Rattle versus SAS Enterprise Miner annual license fees). A small part of that saving might be expended on ensuring we have staff who are capable of understanding and extending that software to suit our needs, rather than vice versa (i.e., changing our needs to suit the software). We feed our extensions back into the grid of open source software, whilst also benefiting from contributions others are making. Some commercial vendors like to call this “communism” as part of their attempt to discredit open source, but we had better learn to share, for the good of the planet, before we lose it.

( Note from Ajay – If you are curious to try R , and have just 15 minutes to try it in, download Rattle from rattle.togaware.com. It has a click and point  interface and auto generates R code in it’s log. Trust me, it would time well spent.)