Using Red R- R with a Visual Interface

For people complaining about the GUI on R, here is the ah Enterprise Version of R called Red R.

It is available at the website at http://www.red-r.org/

 

You can read more there or just go through the short video created by them at

Basically it is a click and point method of using R with the ability to store schemas and thus very good for repeatable operations as well.


Not bad for epic software, huh?

R is an epic fail or is it just overhyped

I came across this nice post from someone who is both knowledgeable and experienced in data. I mean I totally agree that data visualization , user interfaces and unstructured data mining are the trends of the future.

What caught my attention were the words from http://www.thejuliagroup.com/blog/?p=433

However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

Let me analyze this scientifically and dispassionately

R Documentation

I believe that the SAS Online Doc and the SPSS Documentation are both good examples of structured documentation. I do belive that despite the many corporate R products floating- the quality of R documentation is both very extensive and perhaps too big to be put in a neat document something like the ” The Little R Book” or “R Online Doc” would really help.

Entering ? or ?? to search for documentation seems like too difficult work and complex for corporate users it seems. However the documentation for R is not really enterprise software quality is a valid enough point.

Maintaining R

It takes a single line of code or even a single click to update and maintain R.

Apparently the author of the fore mentioned post that existing corporate users are too STUPID OR LAZY to do this.

I like to think most corporate users of statistical software are actually way smarter ( One Hint : They earn money doing that stuff)

Installing R

Anyone who mentions installation costs of software as a reason for enhanced software costs and then mentions R is either biased against R or has not worked with R. Or Both

Learn R

I think anyone cannot learn all R packages just as you cannot learn all the modules of SAS ( like ETS, Stat, etc etc)

R does have more time to learn than Base SAS and this is a valid enough point.

However two R GUI like Rattle and R Commander can help the execution time for this learning.

And increasingly R is taught in universities which is where the battle for future developers or users for platforms like SAS , SPSS , Stata or R would ultimately be decided while the short term monetization of other softwares dazzles people R has too many passionate developers or users to allow it to fail.

However,

R is not perfect. It does need a better corporate version than is currently offered especially to people who are simple users not developers , and it could also to well to better the marketability and visibility of R.

Regarding software costs, ironically while it is easier to estimate how much SAS will cost you in terms of licenses and training time. A similar comparitive document between R and SAS in terms of costs and estimated training costs etc should settle this debate more rationally and more dispassionately than is currently the norm in comparing softwares

Aster Analytics and MapReduce.org

From the Press Release,

Aster Data Announces New Analytics Center and Launches http://www.mapreduce.org to Ease and Accelerate Adoption of MapReduce-Based Analytics

All-Star Team of Analytics Experts and MapReduce.org to Help Companies Build Next Generation Analytic Applications Using SQL-MapReduce and MapReduce Breakthroughs

Las Vegas, NV – April 12, 2010 – Gartner Business Intelligence Summit – Aster Data, a proven leader dedicated to providing the best data management and processing platform for big data volumes and analytics-intensive applications, today unveiled the Aster Analytics Center to help customers accelerate development of advanced analytic applications. Simultaneously, Aster Data also launched the first multi-author destination site for enterprise and government organizations, systems integrators, ISVs, and developers who want to build competency on the MapReduce analytics processing framework and related MapReduce frameworks.http://www.mapreduce.org offers research, education, analysis, customer use cases, key learnings, and tips for anyone interested in understanding the analytical value of MapReduce and related frameworks such as SQL-MapReduce. The new Aster Analytics Center provides product offerings, services, a world-class team, and an elite ecosystem of partners to develop and deliver data-driven applications that use SQL and MapReduce.

www.mapreduce.org is designed to be a key destination for companies who want to understand and build skills around MapReduce, SQL-MapReduce, and related MapReduce technologies. It includes content from those developing data-intensive applications with MapReduce and related MapReduce frameworks such as SQL-MapReduce, as well as insights from industry analysts, customers, and vendors who are leveraging this technology popularized by Google to build next-generation analytic applications. Any industry, enterprise organization, government agency, or expert can contribute content to this site.

Wayne Eckerson, director for TDWI Research and author of the recent article titled Launching an Analytics Practice: 10 Steps to Success, said, “Companies today need experts who can help them accelerate delivery of next-generation, data-driven applications. To run deep analytics on big data requires understanding the analytical capabilities of new database technology, including knowledge of MapReduce and parallel processing requirements.”

Today’s news includes key additions to the Aster Data team. Jonathan Goldman, director of analytics for Aster Data, is responsible for the new Aster Analytics Center, which includes product offerings such as the recently announced Aster Analytics Foundation—a suite of ready-to-use analytics functions and best practices for building advanced analytic applications that involve large data volumes and many diverse data sources. Prior to joining Aster Data he was a principal scientist at LinkedIn, where he led a team of analytics researchers to build cutting-edge products with the rich data sets LinkedIn collected. He created the popular “People You May Know” product for LinkedIn, and developed and supported computationally-intensive and targeted content throughout the site including “Who Viewed My Profile,” the “Similar Jobs” function, and “Similar Members” function, among others. Goldman earned a PhD in physics from Stanford University and a bachelors of science in physics from MIT.

These are interesting developments given the increasing focus on handling complex, unstructured and larger datasets involved in predictive as well as descriptive analytics and data driven strategies.

Rexer Analytics Annual Data Miner Survey

HIGHLIGHTS from the 3rd Annual Data Miner Survey:

  • 40-item survey of data miners, conducted on-line in early 2009.
  • 710 participants from 58 countries.
  • Data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis.
  • Data mining is playing an important role in organizations.
    • Half of data miners say their results are helping to drive strategic decisions and operational processes.
    • 58% say they are adding to the knowledge base in the field.
    • 60% of respondents say the results of their modeling are deployed always or most of the time.
  • Most data miners feel that the economy will not negatively impact them.
  • Almost half of industry data miners rate the analytic capabilities of their company as above average or excellent.  But 19% feel their company has minimal or no analytic capabilities.
  • The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data.  However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year.
  • IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners.
    • Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
    • SAS Enterprise Miner dropped in data miner’s tool rankings this year.
  • Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
  • Fields & Industries:  Data mining is everywhere.  The most sited areas are CRM / Marketing, Academic, Financial Services, & IT / Telecom.  And in the for-profit sector, the departments data miners most frequently work in are Marketing & Sales and Research & Development.


Additional Info can be taken from the Rexer Analytics website- I find their annual survey one of the most useful in summarizing the entire DM and A landscape.


Oracle for possible takeover of REvolution Computing

Updated – Mr Smith gave an update in the comments section confirming the post.

From the press release –

Palo Alto, California – April 1, 2010 – REvolution Computing, the leading commercial provider of software and support for the open source “R” statistical computing language, announced that its CEO, Norman Nie, and Vice President of Community and Product Marketing, David Smith, will join Larry Ellison and other senior executives  of Oracle  at the 2010 Oracle  Business Conference at the Palace Hotel in San Francisco on April 17-18.

This meeting is to discuss exciting embedded analytical opportunities and will closely relate to an exciting announcement of recent breakthroughs by their product teams on in-database analytics.

Nie, Smith and Ellison will be available to meet with analysts, reporters and prospective business partners and clients interested in learning more about REvolution’s enterprise software and solutions for predictive analytics based on open source “R,” including new developments in REvolution’s products and recent deployments at leading pharmaceutical and financial services companies.

REvolution Computing is a featured portfolio company of North Bridge Venture Partners, a leading investor in open source companies.