The Popularity of Data Analysis Software

Here is a nice page by Bob Muenchen (author of “R for SAS and SPSS” and “R for Stata” books)

It is available at http://r4stats.com/popularity and uses a variety of methods, including Google Insights, Page Rank, Link analysis, as well as information from Rexer Analytics and KDNuggets.

I believe the following two graphs sum it all up:

1 Number of Jobs at Monster.com using keywords

2 Google Scholar’s analysis of academic papers

Despite R’s Rapid Growth which is clearly evident, in terms of jobs as well as publications, it lags behind SAS and SPSS. So if you are a corporate user or an academic user, it makes sense to have more than one skill just to be sure.  What do you think? Is learning R mutually exclusive and completely exhaustive from learning SAS or SPSS. See http://r4stats.com/popularity for the complete analysis by Bob Muenchen

Also it shows the tremendous opportunity for companies like Revolution Analytics and XL Solutions ( http://www.experience-rplus.com/ ) as the potential for growth is clearly evident.

Rexer Analytics Annual Data Miner Survey

HIGHLIGHTS from the 3rd Annual Data Miner Survey:

  • 40-item survey of data miners, conducted on-line in early 2009.
  • 710 participants from 58 countries.
  • Data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis.
  • Data mining is playing an important role in organizations.
    • Half of data miners say their results are helping to drive strategic decisions and operational processes.
    • 58% say they are adding to the knowledge base in the field.
    • 60% of respondents say the results of their modeling are deployed always or most of the time.
  • Most data miners feel that the economy will not negatively impact them.
  • Almost half of industry data miners rate the analytic capabilities of their company as above average or excellent.  But 19% feel their company has minimal or no analytic capabilities.
  • The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data.  However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year.
  • IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners.
    • Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
    • SAS Enterprise Miner dropped in data miner’s tool rankings this year.
  • Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
  • Fields & Industries:  Data mining is everywhere.  The most sited areas are CRM / Marketing, Academic, Financial Services, & IT / Telecom.  And in the for-profit sector, the departments data miners most frequently work in are Marketing & Sales and Research & Development.


Additional Info can be taken from the Rexer Analytics website- I find their annual survey one of the most useful in summarizing the entire DM and A landscape.


SPSS Directions : Rexer Survey Results

Here are some results shared by Dr Karl Rexer of Rexer Analytics- they were presented at SPSS Directions

When asked to select all of the software packages they use for data mining, each person selected an average of 5 tools.  More data miners reported using SPSS Statistics than any other tool.  And when we asked people to indicate their primary data mining tool, the tool selected by the most data miners was SPSS Modeler (Clementine).  The SPSS people were also thrilled to see that Clementine was #1 in customer satisfaction — everyone (N=78) who identified it as their primary tool were satisfied or very satisfied.  It’s pretty amazing that not even one person was neutral (it was a 5-point scale).

For a  detailed poster on the results contact www.RexerAnalytics.com More than 710 data mining professionals had completed the survey.