Kill R? Wait a sec

1) Is R efficient? (scripting wise, and performance wise) _ Depends on how you code it- some Packages like foreach can help but basic efficiency come from programmer. XDF formats from Revoscalar -the non open R package further improve programming efficiency

2) Should R be written from scratch?

You got to be kidding- It depends on how you define scratch after 2 million users

This has been done with S, then S Plus and now R.

3) What should be the license of R (if it was made a new)?

GPL license is fine. You need to do a better job of executing the license. Currently interfaces to R exist from SPSS, SAS, KXEN , other companies as well. To my knowledge royalty payments as well as formal code sharing does not agree.

R core needs to do a better job of protecting the work of 2500 package-creators rather than settling for a few snacks at events, sponsorships, Corporate Board Membership for Prof Gentleman, and 4-5 packages donated to it. The only way R developers can currently support their research is write a book (ny Springer mostly)

Eg GGplot and Hmisc are likely to be used more by average corporate user. Do their creators deserve royalty if creators of RevoScalar are getting it?

If some of 2 million users gave 1 $ to R core (compared to 9 million in last round of funding in Revolution Analytics)- you would have enough money to create a 64 bit optimized R for Linux (missing in Enterprise R), Amazon R APIs (like Karim Chine’s efforts), R GUIs (like Rattle’s commercial version) etc etc

The developments are not surprising given that Microsoft and Intel are funding Revolution Analytics http://www.dudeofdata.com/?p=1967

R controversies come and go (this has happened before including the NYT article and shakeup at Revo)

An interesting debate on whether R should be killed to make an upgrade to a more efficient language.

From Tal (creator R Bloggers) and on R help list-

There is currently a (very !) lively discussions happening around the web, surrounding the following topics:
1) Is R efficient? (scripting wise, and performance wise)
2) Should R be written from scratch?
3) What should be the license of R (if it was made a new)?

Very serious people have taken part in the debates so far.  I hope to let you know of the places I came by, so you might be able to follow/participate
in these (IMHO) important discussions.

The discussions started in the response for the following blog post on
Xi’An’s blog:
http://xianblog.wordpress.com/2010/09/06/insane/


Followed by the (short) response post by Ross Ihaka:
http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/


Other discussions started to appear on Andrew Gelman’s blog:
http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_ihaka_to_r.html

And (many) more responses started to appear in the hackers news website:
http://news.ycombinator.com/item?id=1687054

I hope these discussions will have fruitful results for our community,
Tal

—————-Contact
Details:——————————————————-
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)

My 0 cents ( see it would 2 cents but it;s free)

Open Source Business Intelligence: Pentaho and Jaspersoft

Here are two products that are used widely for Business Intelligence_ They are open source and both have free preview.

Jaspersoft-For the Enterprise version click on the screenshot while for the free community version you can go to

http://jasperforge.org/projects/jasperserver

Interestingly (and not surprisingly) Revolution Analytics is teaming up with Jaspersoft to use R for reporting along with the Jaspersoft BI stack.

ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

FREE WEBINAR WEDNESDAY, SEPTEMBER 22ND @9AM PACIFIC

DEPLOYING R: ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB

A JOINT WEBINAR FROM REVOLUTION ANALYTICS AND JASPERSOFT

Date: Wednesday, September 22, 2010
Time: 9:00am PDT (12:00pm EDT; 4:00pm GMT)
Presenters: David Smith, Vice President of Marketing, Revolution Analytics
Andrew Lampitt, Senior Director of Technology Alliances, Jaspersoft
Matthew Dahlman, Business Development Engineer, Jaspersoft
Registration: Click here to register now!

R is a popular and powerful system for creating custom data analysis, statistical models, and data visualizations. But how can you make the results of these R-based computations easily accessible to others? A PhD statistician could use R directly to run the forecasting model on the latest sales data, and email a report on request, but then the process is just going to have to be repeated again next month, even if the model hasn’t changed. Wouldn’t it be better to empower the Sales manager to run the model on demand from within the BI application she already uses—daily, even!—and free up the statistician to build newer, better models for others?

In this webinar, David Smith (VP of Marketing, Revolution Analytics) will introduce the new “RevoDeployR” Web Services framework for Revolution R Enterprise, which is designed to make it easy to integrate dynamic R-based computations into applications for business users. RevoDeployR empowers data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise. Application developers can then use the RevoDeployR Web Services API to securely and scalably integrate the results of these scripts into any application, without needing to learn the R language. With RevoDeployR, authorized users of hosted or cloud-based interactive Web applications, desktop applications such as Microsoft Excel, and BI applications like Jaspersoft can all benefit from on-demand analytics and visualizations developed by expert R users.

To demonstrate the power of deploying R-based computations to business users, Andrew Lampitt will introduce Jaspersoft commercial open source business intelligence, the world’s most widely used BI software. In a live demonstration, Matt Dahlman will show how to supercharge the BI process by combining Jaspersoft and Revolution R Enterprise, giving business users on-demand access to advanced forecasts and visualizations developed by expert analysts.

Click here to register for the webinar.

Speaker Biographies:

David Smith is the Vice President of Marketing at Revolution Analytics, the leading commercial provider of software and support for the open source “R” statistical computing language. David is the co-author (with Bill Venables) of the official R manual An Introduction to R. He is also the editor of Revolutions (http://blog.revolutionanalytics.com), the leading blog focused on “R” language, and one of the originating developers of ESS: Emacs Speaks Statistics. You can follow David on Twitter as @revodavid.

Andrew Lampitt is Senior Director of Technology Alliances at Jaspersoft. Andrew is responsible for strategic initiatives and partnerships including cloud business intelligence, advanced analytics, and analytic databases. Prior to Jaspersoft, Andrew held other business positions with Sunopsis (Oracle), Business Objects (SAP), and Sybase (SAP). Andrew earned a BS in engineering from the University of Illinois at Urbana Champaign.

Matthew Dahlman is Jaspersoft’s Business Development Engineer, responsible for technical aspects of technology alliances and regional business development. Matt has held a wide range of technical positions including quality assurance, pre-sales, and technical evangelism with enterprise software companies including Sybase, Netonomy (Comverse), and Sunopsis (Oracle). Matt earned a BA in mathematics from Carleton College in Northfield, Minnesota.


The second widely used BI stack in open source is Pentaho.

You can download it here to evaluate it or click on screenshot to read more at

http://community.pentaho.com/

http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/

WordPress.com Tweeting

Just a single click on a check mark to enable tweeting from your every blog post (similar to a Tweetmeme button)

Q&A with David Smith, Revolution Analytics.

Here’s a group of questions and answers that David Smith of Revolution Analytics was kind enough to answer post the launch of the new R Package which integrates Hadoop and R-                         RevoScaleR

Ajay- How does RevoScaleR work from a technical viewpoint in terms of Hadoop integration?

David-The point isn’t that there’s a deep technical integration between Revolution R and Hadoop, rather that we see them as complementary (not competing) technologies. Hadoop is amazing at reliably (if slowly) processing huge volumes of distributed data; the RevoScaleR package complements Hadoop by providing statistical algorithms to analyze the data processed by Hadoop. The analogy I use is to compare a freight train with a race car: use Hadoop to slog through a distributed data set and use Map/Reduce to output an aggregated, rectangular data file; then use RevoScaleR to perform statistical analysis on the processed data (and use the speed of RevolScaleR to iterate through many model options to find the best one).

Ajay- How is it different from MapReduce and R Hipe– existing R Hadoop packages?
David- They’re complementary. In fact, we’ll be publishing a white paper soon by Saptarshi Guha, author of the Rhipe R/Hadoop integration, showing how he uses Hadoop to process vast volumes of packet-level VOIP data to identify call time/duration from the packets, and then do a regression on the table of calls using RevoScaleR. There’s a little more detail in this blog post: http://blog.revolutionanalytics.com/2010/08/announcing-big-data-for-revolution-r.html
Ajay- Is it going to be proprietary, free or licensable (open source)?
David- RevoScaleR is a proprietary package, available to paid subscribers (or free to academics) with Revolution R Enterprise. (If you haven’t seen it, you might be interested in this Q&A I did with Matt Shotwell: http://biostatmatt.com/archives/533 )
Ajay- Any existing client case studies for Terabyte level analysis using R.
David- The VOIP example above gets close, but most of the case studies we’ve seen in beta testing have been in the 10’s to 100’s of Gb range. We’ve tested RevoScaleR on larger data sets internally, but we’re eager to hear about real-life use cases in the terabyte range.
Ajay- How can I use RevoScaleR on my dual chip Win Intel laptop for say 5 gb of data.
David- One of the great things about RevoScaleR is that it’s designed to work on commodity hardware like a dual-core laptop. You won’t be constrained by the limited RAM available, and the parallel processing algorithms will make use of all cores available to speed up the analysis even further. There’s an example in this white paper (http://info.revolutionanalytics.com/bigdata.html) of doing linear regression on 13Gb of data on a simple dual-core laptop in less than 5 seconds.
AJ-Thanks to David Smith, for this fast response and wishing him, Saptarshi Guha Dr Norman Nie and the rest of guys at Revolution Analytics a congratulations for this new product launch.

Business Analytics Analyst Relations /Ethics/White Papers

Curt Monash, whom I respect and have tried to interview (unsuccessfully) points out suitable ethical dilemmas and gray areas in Analyst Relations in Business Intelligence here at http://www.dbms2.com/2010/07/30/advice-for-some-non-clients/

If you dont know what Analyst Relations are, well it’s like credit rating agencies for BI software. Read Curt and his landscaping of the field here ( I am quoting a summary) at http://www.strategicmessaging.com/the-ethics-of-white-papers/2010/08/01/

Vendors typically pay for

  1. They want to connect with sales prospects.
  2. They want general endorsement from the analyst.
  3. They specifically want endorsement from the analyst for their marketing claims.
  4. They want the analyst to do a better job of explaining something than they think they could do themselves.
  5. They want to give the analyst some money to enhance the relationship,

Merv Adrian (I interviewed Merv here at http://www.dudeofdata.com/?p=2505) has responded well here at http://www.enterpriseirregulars.com/23040/white-paper-sponsorship-and-labeling/

None of the sites I checked clearly identify the work as having been sponsored in any way I found obvious in my (admittefly) quick scan. So this is an issue, but it’s not confined to Oracle.

My 2 cents (not being so well paid 😉 are-

I think Curt was calling out Oracle (which didnt respond) and not Merv ( whose subsequent blog post does much to clarify).

As a comparative new /younger blogger in this field,
I applaud both Curt to try and bell the cat ( or point out what everyone in AR winks at) and for Merv for standing by him.

In the long run, it would strengthen analyst relations as a channel if they separate financial payment of content from bias. An example is credit rating agencies who forgot to do so in BFSI and see what happened.

Customers invest millions of dollars in BI systems trusting marketing collateral/white papers/webinars/tests etc. Perhaps it’s time for an industry association for analysts so that individual analysts don’t knuckle down under vendor pressure.

It is easier for someone of Curt, Merv’s stature to declare editing policy and disclosures before they write a white paper.It is much harder for everyone else who is not so well established.

White papers can take as much as 25,000$ to produce- and I know people who in Business Analytics (as opposed to Business Intelligence) slog on cents per hour cranking books on R, SAS , webinars, trainings but there are almost no white papers in BA. Are there any analytics independent analysts who are not biased by R or SAS or SPSS or etc etc. I am not sure but this looks like a good line to  pursue 😉 – provided ethical checks and balances are established.

Personally I know of many so called analytics communities go all out to please their sponsors so bias in writing does exist (you cant praise SAS on a R Blogging Forum or R USers Meet and you cant write on WPS at SAS Community.org )

– at the same time someone once told me- It is tough to make a living as a writer, and that choice between easy money and credible writing needs to be respected.

Most sponsored white papers I read are pure advertisements, directed at CEOs rather than the techie community at large.

Almost every BI vendor claims to have the fastest database with 5X speed- and benchmarking in technical terms could be something they could do too.

Just like Gadget sites benchmark products, you can not benchmark BI or even BA products as it is written not to do so  in many licensing terms.

Probably that is the reason Billions are spent in BI and the positive claims are doubtful ( except by the sellers). Similarly in Analytics, many vendors would have difficulty justifying their claims or prices if they are subjected to a side by side comparison. Unfortunately the resulting confusion results in shoddy technology coming stronger due to more aggressive marketing.

My latest creation

I have just teamed up to create my latest venture called Kush Cognitives (Kush is my son). The firm is gonna make websites, build statistical analysis and offer social media offerings. It’s my latest venture and it merges all my previous ones and skills. After almost 3 years of working on and off with multiple people, this one is with a friend in the US.

Over the years (since 2007) I have made http://virtua-analytics.com (defunct), Swarajya Analytics Private Limited (www.swanplc.com – now sold) and now Kush Cognitives. I have gone through the models of proprietorship and corporation and now partnership.

Kush Cognitives is hosted at Decisionstats.com (as our flagship website) and we have shifted the blog to Decisionstats.Wordpress.com

We are aiming at the startups and small and medium segments first, but we retain capabilities for bigger clients as well. Lesser Bullshit and More Bang for your Buck.

So wish us luck- and if you need any social media advice, statistical analysis to be done, or technical matters of creating websites-This also includes training customization in R , SAS  , and statistical software but from a more practical point of view from a user angle. We are able to cater to both US and Indian clients.

give us a buzz at http://decisionstats.com

regards

Ajay Ohri

Image Courtesy-michelangelo

Protected: SAS Institute lawsuit against WPS Episode 2 The Clone Wars

This content is password-protected. To view it, please enter the password below.