Trrrouble in land of R…and Open Source Suggestions

Recently some comments by Ross Ihake , founder of R Statistical Software on Revolution Analytics, leading commercial vendor of R….. came to my attention-

http://www.stat.auckland.ac.nz/mail/archive/r-downunder/2010-May/000529.html

[R-downunder] Article on Revolution Analytics

Ross Ihaka ihaka at stat.auckland.ac.nz
Mon May 10 14:27:42 NZST 2010


On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand
and from http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/
Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. 
Conclusion-
So one co-founder of R is advocating not to buy from Revolution Analytics , which has the other co-founder of R, Gentleman on its board. 
Source- http://www.revolutionanalytics.com/aboutus/leadership.php

2) If Revolution Analytics is using 2500 packages for free but insisting on getting paid AND closing source of it’s packages (which is a technical point- how exactly can you prevent source code of a R package from being seen)

Maybe there can be a PACKAGE marketplace just like Android Apps, Facebook Apps, and Salesforce.com Apps – so atleast some of the thousands of R package developers can earn – sorry but email lists do not pay mortgages and no one is disputing the NEED for commercializing R or rewarding developers.

Though Barr created SAS, he gave up control to Goodnight and Sall https://decisionstats.wordpress.com/2010/06/02/sas-early-days/

and Goodnight and Sall do pay their developers well- to the envy of not so well paid counterparts.

3) I really liked the innovation of Revolution Analytics RevoScalar, and I wish that the default R dataset be converted to XDF dataset so that it basically kills

off the R criticism of being slow on bigger datasets. But I also realize the need for creating an analytics marketplace for R developers and R students- so academic version of R being free and Revolution R being paid seems like a trade off.

Note- You can still get a job faster as a stats student if you mention SAS and not R as a statistical skill- not all stats students go into academics.

4) There can be more elegant ways of handling this than calling for ignoring each other as REVOLUTION and Ihake seem to be doing to each other.

I can almost hear people in Cary, NC chuckling at Norman Nie, long time SPSS opponent and now REVOLUTION CEO, and his antagonizing R’s academicians within 1 year of taking over- so I hope this ends well for all. The road to hell is paved with good intentions- so if REVOLUTION can share some source code with say R Core members (even Microsoft shares source code with partners)- and R Core and Revolution agree on a licensing royalty from each other, they can actually speed up R package creation rather than allow this 2 decade effort to end up like S and S plus and TIBCO did.

Maybe Richard Stallman can help-or maybe Ihaka has a better sense of where things will go down in a couple of years-he must know something-he invented it, didnt he

On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand

Google AppInventor -Android and Business Intelligence

Here is a great new tool for techies to start creating Android Apps right away- even if you have no knowledge of the platform. Of course there are existing great number of apps- including my favorite Android Data Mining App in R – called AnalyticDroid http://analyticdroid.togaware.com/

Basically it calls the Rattle (R Analytical Tool To Learn Easily) Data Mining GUI -enabling data mining from an Android Mobile using remote computing.

I dont know if any other statistical application is available on Android Mobiles- though SAS did have a presentation on using SAS on IPhone

http://www.wuss.org/proceedings09/09WUSSProceedings/papers/dpr/DPR-Truong.pdf



SAS Mobile -Iphone App

All you need to do is go to http://appinventor.googlelabs.com/about/index.html and request access (yes there is a 2 week approval waiting line)

Because App Inventor provides access to a GPS-location sensor, you can build apps that know where you are. You can build an app to help you remember where you parked your car, an app that shows the location of your friends or colleagues at a concert or conference, or your own custom tour app of your school, workplace, or a museum.
You can write apps that use the phone features of an Android phone. You can write an app that periodically texts “missing you” to your loved ones, or an app “No Text While Driving” that responds to all texts automatically with “sorry, I’m driving and will contact you later”. You can even have the app read the incoming texts aloud to you (though this might lure you into responding).
App Inventor provides a way for you to communicate with the web. If you know how to write web apps, you can use App Inventor to write Android apps that talk to your favorite web sites, such as Amazon and Twitter.

Here is a not so statistical Android App I am trying to create called Hang-Out

using the current GPS location of your phone to find nearest Pub, Movie or Diner and catch Bus- Train based on your location city, the GPS and time of request and schedule of those cities public transport- very much WIP

Open Source Business Intelligence: Pentaho and Jaspersoft

Here are two products that are used widely for Business Intelligence_ They are open source and both have free preview.

Jaspersoft-For the Enterprise version click on the screenshot while for the free community version you can go to

http://jasperforge.org/projects/jasperserver

Interestingly (and not surprisingly) Revolution Analytics is teaming up with Jaspersoft to use R for reporting along with the Jaspersoft BI stack.

ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

FREE WEBINAR WEDNESDAY, SEPTEMBER 22ND @9AM PACIFIC

DEPLOYING R: ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

A JOINT WEBINAR FROM REVOLUTION ANALYTICS AND JASPERSOFT

Date: Wednesday, September 22, 2010
Time: 9:00am PDT (12:00pm EDT; 4:00pm GMT)
Presenters: David Smith, Vice President of Marketing, Revolution Analytics
Andrew Lampitt, Senior Director of Technology Alliances, Jaspersoft
Matthew Dahlman, Business Development Engineer, Jaspersoft
Registration: Click here to register now!

R is a popular and powerful system for creating custom data analysis, statistical models, and data visualizations. But how can you make the results of these R-based computations easily accessible to others? A PhD statistician could use R directly to run the forecasting model on the latest sales data, and email a report on request, but then the process is just going to have to be repeated again next month, even if the model hasn’t changed. Wouldn’t it be better to empower the Sales manager to run the model on demand from within the BI application she already uses—daily, even!—and free up the statistician to build newer, better models for others?

In this webinar, David Smith (VP of Marketing, Revolution Analytics) will introduce the new “RevoDeployR” Web Services framework for Revolution R Enterprise, which is designed to make it easy to integrate dynamic R-based computations into applications for business users. RevoDeployR empowers data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise. Application developers can then use the RevoDeployR Web Services API to securely and scalably integrate the results of these scripts into any application, without needing to learn the R language. With RevoDeployR, authorized users of hosted or cloud-based interactive Web applications, desktop applications such as Microsoft Excel, and BI applications like Jaspersoft can all benefit from on-demand analytics and visualizations developed by expert R users.

To demonstrate the power of deploying R-based computations to business users, Andrew Lampitt will introduce Jaspersoft commercial open source business intelligence, the world’s most widely used BI software. In a live demonstration, Matt Dahlman will show how to supercharge the BI process by combining Jaspersoft and Revolution R Enterprise, giving business users on-demand access to advanced forecasts and visualizations developed by expert analysts.

Click here to register for the webinar.

Speaker Biographies:

David Smith is the Vice President of Marketing at Revolution Analytics, the leading commercial provider of software and support for the open source “R” statistical computing language. David is the co-author (with Bill Venables) of the official R manual An Introduction to R. He is also the editor of Revolutions (http://blog.revolutionanalytics.com), the leading blog focused on “R” language, and one of the originating developers of ESS: Emacs Speaks Statistics. You can follow David on Twitter as @revodavid.

Andrew Lampitt is Senior Director of Technology Alliances at Jaspersoft. Andrew is responsible for strategic initiatives and partnerships including cloud business intelligence, advanced analytics, and analytic databases. Prior to Jaspersoft, Andrew held other business positions with Sunopsis (Oracle), Business Objects (SAP), and Sybase (SAP). Andrew earned a BS in engineering from the University of Illinois at Urbana Champaign.

Matthew Dahlman is Jaspersoft’s Business Development Engineer, responsible for technical aspects of technology alliances and regional business development. Matt has held a wide range of technical positions including quality assurance, pre-sales, and technical evangelism with enterprise software companies including Sybase, Netonomy (Comverse), and Sunopsis (Oracle). Matt earned a BA in mathematics from Carleton College in Northfield, Minnesota.


The second widely used BI stack in open source is Pentaho.

You can download it here to evaluate it or click on screenshot to read more at

http://community.pentaho.com/

http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/

Rapid Miner- R Extension

Here is a new video which shows exactly how you can use Rapid Miner and R together. Advantages of using both together is using Rapid Miner’s GUI (including the flowchart style for data mning) and adding R statistical functionality to it.

From http://rapid-i.com/content/view/219/1/

The web site features a video showing how easy R models and scripts can be integrated into the RapidMiner analysis processes. RapidMiner offers a new R perspective consisting of the known R console together with the great plotting facilities of R. All variables as well as R scripts can be stored in the RapidMiner Repository and used from there which helps to organize the usually large number of scripts. Furthermore, widely used modeling methods are directly integrated as RapidMiner operators as usual.

“This is a huge step for open source data analysis. RapidMiner offers a great user interface, a clear process structure and lots of ETL and analysis capabilities necessary for real-world problems. R adds a lot of flexibility and many analysis and data manipulation methods. The result is the by far most powerful data transformation and analysis solution worldwide. And this analysis power is now combined with the ease-of-use already known from RapidMiner.” states Dr. Ingo Mierswa, CEO of Rapid-I.

Visit the RCOMM 2010 and learn more about how to integrate analysis and preprocessing methods offered by R as well as how to use the new R perspective offering a full R console and access to all R plotters.

Thus Rapid Miner is one more mainstream software (after SPSS, SAS etc) to add R functionality to it.

Interview Stephanie McReynolds Director Product Marketing, AsterData

Here is an interview with Stephanie McReynolds who works as as Director of Product Marketing with AsterData. I asked her a couple of questions about the new product releases from AsterData in analytics and MapReduce.

Ajay – How does the new Eclipse Plugin help people who are already working with huge datasets but are new to AsterData’s platform?

Stephanie- Aster Data Developer Express, our new SQL-MapReduce development plug-in for Eclipse, makes MapReduce applications easy to develop. With Aster Data Developer Express, developers can develop, test and deploy a complete SQL-MapReduce application in under an hour. This is a significant increase in productivity over the traditional analytic application development process for Big Data applications, which requires significant time coding applications in low-level code and testing applications on sample data.

Ajay – What are the various analytical functions that are introduced by you recently- list say the top 10.

Stephanie- At Aster Data, we have an intense focus on making the development process easier for SQL-MapReduce applications. Aster Developer Express is a part of this initiative, as is the release of pre-defined analytic functions. We recently launched both a suite of analytic modules and a partnership program dedicated to delivering pre-defined analytic functions for the Aster Data nCluster platform. Pre-defined analytic functions delivered by Aster Data’s engineering team are delivered as modules within the Aster Data Analytic Foundation offering and include analytics in the areas of pattern matching, clustering, statistics, and text analysis– just to name a few areas. Partners like Fuzzy Logix and Cobi Systems are extending this library by delivering industry-focused analytics like Monte Carlo Simulations for Financial Services and geospatial analytics for Public Sector– to give you a few examples.

Ajay – So okay I want to do a K Means Cluster on say a million rows (and say 200 columns) using the Aster method. How do I go about it using the new plug-in as well as your product.

Stephanie- The power of the Aster Data environment for analytic application development is in SQL-MapReduce. SQL is a powerful analytic query standard because it is a declarative language. MapReduce is a powerful programming framework because it can support high performance parallel processing of Big Data and extreme expressiveness, by supporting a wide variety of programming languages, including Java, C/C#/C++, .Net, Python, etc. Aster Data has taken the performance and expressiveness of MapReduce and combined it with the familiar declarativeness of SQL. This unique combination ensures that anyone who knows standard SQL can access advanced analytic functions programmed for Big Data analysis using MapReduce techniques.

kMeans is a good example of an analytic function that we pre-package for developers as part of the Aster Data Analytic Foundation. What does that mean? It means that the MapReduce portion of the development cycle has been completed for you. Each pre-packaged Aster Data function can be called using standard SQL, and executes the defined analytic in a fully parallelized manner in the Aster Data database using MapReduce techniques. The result? High performance analytics with the expressiveness of low-level languages accessed through declarative SQL.

Ajay – I see an an increasing focus on Analytics. Is this part of your product strategy and how do you see yourself competing with pure analytics vendors.

Stephanie – Aster Data is an infrastructure provider. Our core product is a massively parallel processing database called nCluster that performs at or beyond the capabilities of any other analytic database in the market today. We developed our analytics strategy as a response to demand from our customers who were looking beyond the price/performance wars being fought today and wanted support for richer analytics from their database provider. Aster Data analytics are delivered in nCluster to enable analytic applications that are not possible in more traditional database architectures.

Ajay – Name some recent case studies in Analytics of implementation of MR-SQL with Analytical functions

Stephanie – There are three new classes of applications that Aster Data Express and Aster Analytic Foundation support: iterative analytics, prediction and optimization, and ad hoc analysis.

Aster Data customers are uncovering critical business patterns in Big Data by performing hypothesis-driven, iterative analytics. They are exploring interactively massive volumes of data—terabytes to petabytes—in a top-down deductive manner. ComScore, an Aster Data customer that performs website experience analysis is a good example of an Aster Data customer performing this type of analysis.

Other Aster Data customers are building applications for prediction and optimization that discover trends, patterns, and outliers in data sets. Examples of these types of applications are propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services. Full Tilt Poker, who is using Aster Data for fraud prevention is a good example of a customer in this space.

The final class of application that I would like to highlight is ad hoc analysis. Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis and a wide variety of mathematical, trigonometry, and statistical functions. LinkedIn, whose analysts and data scientists have access to all of their customer data in Aster Data are a good example of a customer using the system in this manner.

While Aster Data customers are using nCluster in a number of other ways, these three new classes of applications are areas in which we are seeing particularly innovative application development.

Biography-

Stephanie McReynolds is Director of Product Marketing at Aster Data, where she is an evangelist for Aster Data’s massively parallel data-analytics server product. Stephanie has over a decade of experience in product management and marketing for business intelligence, data warehouse, and complex event processing products at companies such as Oracle, Peoplesoft, and Business Objects. She holds both a master’s and undergraduate degree from Stanford University.

MapReduce Analytics Apps- AsterData's Developer Express Plugin

AsterData continues to wow with it’s efforts on bridging MapReduce and Analytics, with it’s new Developer Express plug-in for Eclipse. As any Eclipse user knows, that greatly improves ability to write code or develop ( similar to creating Android apps if you have tried to). I did my winter internship at AsterData last December last year in San Carlos, and its an amazing place with giga-level bright people.

Here are some details ( Note I plan to play a bit more on the plugin on my currently downUbuntu on this and let you know)

http://marketplace.eclipse.org/content/aster-data-developer-express-plug-eclipse

Aster Data Developer Express provides an integrated set of tools for development of SQL and MapReduce analytics for Aster Data nCluster, a massively parallel database with an integrated analytics engine.

The Aster Data Developer Express plug-in for Eclipse enables developers to easily create new analytic application projects with the help of an intuitive set of wizards, immediately test their applications on their desktop, and push down their applications into the nCluster database with a single click.

Using Developer Express, analysts can significantly reduce the complexity and time needed to create advanced analytic applications so that they can more rapidly deliver deeper and richer analytic insights from their data.

and from the Press Release

Now, any developer or analyst that is familiar with the Java programming language can complete a rich analytic application in under an hour using the simple yet powerful Aster Data Developer Express environment in Eclipse. Aster Data Developer Express delivers both rapid development and local testing of advanced analytic applications for any project, regardless of size.

The free, downloadable Aster Data Developer Express IDE now brings the power of SQL-MapReduce to any organization that is looking to build richer analytic applications that can leverage massive data volumes. Much of the MapReduce coding, including programming concepts like parallelization and distributed data analysis, is addressed by the IDE without the developer or analyst needing to have expertise in these areas. This simplification makes it much easier for developers to be successful quickly and eliminates the need for them to have any deep knowledge of the MapReduce parallel processing framework. Google first published MapReduce in 2004 for parallel processing of big data sets. Aster Data has coupled SQL with MapReduce and brought SQL-MapReduce to market, making it significantly easier for any organization to leverage the power of MapReduce. The Aster Developer Express IDE simplifies application development even further with an intuitive point-and-click development environment that speeds development of rich analytic applications. Applications can be validated locally on the desktop or ultimately within Aster Data nCluster, a massive parallel processing (MPP) database with a fully integrated analytics engine that is powered by MapReduce—known as a data-analytics server.

Rich analytic applications that can be easily built with Aster Data’s downloadable IDE include:

Iterative Analytics: Uncovering critical business patterns in your data requires hypothesis-driven, iterative analysis.  This class of applications is defined by the exploratory navigation of massive volumes of data in a top-down, deductive manner.  Aster Data’s IDE makes this easy to develop and to validate the algorithms and functions required to deliver these advanced analytic applications.

Prediction and Optimization: For this class of applications, the process is inductive. Rather than starting with a hypothesis, developers and analysts can easily build analytic applications that discover the trends, patterns, and outliers in data sets.  Examples include propensity to churn in telecommunications, proactive product and service recommendations in retail, and pricing and retention strategies in financial services.

Ad Hoc Analysis: Examples of ad hoc analysis that can be performed includes social network analysis, advanced click stream analysis, graph analysis, cluster analysis, and a wide variety of mathematical, trigonometry, and statistical functions.

“Aster Data’s IDE and SQL-MapReduce significantly eases development of advanced analytic applications on big data. We have now built over 350 analytic functions in SQL-MapReduce on Aster Data nCluster that are available for customers to purchase,” said Partha Sen, CEO and Founder of Fuzzy Logix. “Aster Data’s implementation of MapReduce with SQL-MapReduce goes beyond the capabilities of general analytic development APIs and provides us with the excellent control and flexibility needed to implement even the most complex analytic algorithms.”

Richer analytics on big data volumes is the new competitive frontier. Organizations have always generated reports to guide their decision-making. Although reports are important, they are historical sets of information generally arranged around predefined metrics and generated on a periodic basis.

Advanced analytics begins where reporting leaves off. Reporting often answers historical questions such as “what happened?” However, analytics addresses “why it happened” and, increasingly, “what will happen next?” To that end, solutions like Aster Data Developer Express ease the development of powerful ad hoc, predictive analytics and enables analysts to quickly and deeply explore terabytes to petabytes of data.
“We are in the midst of a new age in analytics. Organizations today can harness the power of big data regardless of scale or complexity”, said Don Watters, Chief Data Architect for MySpace. “Solutions like the Aster Data Developer Express visual development environment make it even easier by enabling us to automate aspects of development that currently take days, allowing us to build rich analytic applications significantly faster. Making Developer Express openly available for download opens the power of MapReduce to a broader audience, making big data analytics much faster and easier than ever before.”

“Our delivery of SQL coupled with MapReduce has clearly made it easier for customers to build highly advanced analytic applications that leverage the power of MapReduce. The visual IDE, Aster Data Developer Express, introduced earlier this year, made application development even easier and the great response we have had to it has driven us to make this open and freely available to any organization looking to build rich analytic applications,” said Tasso Argyros, Founder and CTO, Aster Data. “We are excited about today’s announcement as it allows companies of all sizes who need richer analytics to easily build powerful analytic applications and experience the power of MapReduce without having to learn any new skills.”

You can have a look here at http://www.asterdata.com/download_developer_express/

IPSUR – A Free R Textbook

Here is a free R textbook called IPSUR-

http://ipsur.r-forge.r-project.org/book/index.php

IPSUR stands for Introduction to Probability and Statistics Using R, ISBN: 978-0-557-24979-4, which is a textbook written for an undergraduate course in probability and statistics. The approximate prerequisites are two or three semesters of calculus and some linear algebra in a few places. Attendees of the class include mathematics, engineering, and computer science majors.

IPSUR is FREE, in the GNU sense of the word. Hard copies are available for purchase here from Lulu and will be available (coming soon) from the other standard online retailers worldwide. The price of the book is exactly the manufacturing cost plus the retailers’ markup. You may be able to get it even cheaper by downloading an electronic copy and printing it yourself, but if you elect this route then be sure to get the publisher-quality PDF from theDownloads page. And double check the price. It was cheaper for my students to buy a perfect-bound paperback from Lulu and have it shipped to their door than it was to upload the PDF to Fed-Ex Kinkos and Xerox a coil-bound copy (and on top of that go pick it up at the store).

If you are going to buy from anywhere other than Lulu then be sure to check the time-stamp on the copyright page. There is a 6 to 8 week delay from Lulu to Amazon and you may not be getting the absolute latest version available.

Refer to the Installation page for instructions to install an electronic copy of IPSUR on your personal computer. See the Feedback page for guidance about questions or comments you may have about IPSUR.

Also see http://ipsur.r-forge.r-project.org/rcmdrplugin/index.php for the R Cmdr Plugin

This plugin for the R Commander accompanies the text Introduction to Probability and Statistics Using R by G. Jay Kerns. The plugin contributes functions unique to the book as well as specific configuration and functionality to R Commander, the pioneering work by John Fox of McMaster University.

RcmdrPlugin.IPSUR’s primary goal is to provide a user-friendly graphical user interface (GUI) to the open-source and freely available R statistical computing environment. RcmdrPlugin.IPSUR is equipped to handle many of the statistical analyses and graphical displays usually encountered by upper division undergraduate mathematics, statistics, and engineering majors. Available features are comparable to many expensive commercial packages such as Minitab, SPSS, and JMP-IN.

Since the audience of RcmdrPlugin.IPSUR is slightly different than Rcmdr’s, certain functionality has been added and selected error-checks have been disabled to permit the student to explore alternative regions of the statistical landscape. The resulting benefit of increased flexibility is balanced by somewhat increased vulnerability to syntax errors and misuse; the instructor should keep this and the academic audience in mind when usingRcmdrPlugin.IPSUR in the classroom