Home » Posts tagged 'david smith'
Tag Archives: david smith
Tal G in his excellent blog piece talks of “Why R Developers should not be paid” http://www.r-statistics.com/2010/09/open-source-and-money-why-r-developers-shouldnt-be-paid/
His argument of love is not very original though it was first made by these four guys
I am going to argue that “some” R developers should be paid, while the main focus should be volunteers code. These R developers should be paid as per usage of their packages.
Let me expand.
Imagine the following conversation between Ross Ihaka, Norman Nie and Peter Dalgaard.
Norman- Hey Guys, Can you give me some code- I got this new startup.
Ross Ihaka and Peter Dalgaard- Sure dude. Here is 100,000 lines of code, 2000 packages and 2 decades of effort.
Norman- Thanks guys.
Ross Ihaka- Hey, What you gonna do with this code.
Norman- I will better it. Sell it. Finally beat Jim Goodnight and his **** Proc GLM and **** Proc Reg.
Ross- Okay, but what will you give us? Will you give us some code back of what you improve?
Norman – Uh, let me explain this open core …
Peter D- Well how about some royalty?
Norman- Sure, we will throw parties at all conferences, snacks you know at user groups.
Ross – Hmm. That does not sound fair. (walks away in a huff muttering)-He takes our code, sells it and wont share the code
Peter D- Doesnt sound fair. I am back to reading Hamlet, the great Dane, and writing the next edition of my book. I am glad I wrote a book- Ross didnt even write that.
Norman-Uh Oh. (picks his phone)- Hey David Smith, We need to write some blog articles pronto – these open source guys ,man…
———–I think that sums what has been going on in the dynamics of R recently. If Ross Ihaka and R Gentleman had adopted an open core strategy- meaning you can create packages to R but not share the original where would we all be?
At this point if he is reading this, David Smith , long suffering veteran of open source flameouts is rolling his eyes while Tal G is wondering if he will publish this on R Bloggers and if so when or something.
Lets bring in another R veteran- Hadley Wickham who wrote a book on R and also created ggplot. Thats the best quality, most often used graphics package.
In terms of economic utilty to end user- the ggplot package may be as useful if not more as the foreach package developed by Revolution Computing/Analytics.
Now http://cran.r-project.org/web/packages/foreach/index.html says that foreach is licensed under http://www.apache.org/licenses/LICENSE-2.0
However lets come to open core licensing ( read it here http://alampitt.typepad.com/lampitt_or_leave_it/2008/08/open-core-licen.html ) which is where the debate is- Revolution takes code- enhances it (in my opinion) substantially with new formats XDF for better efficieny, web services API, and soon coming next year a GUI (thanks in advance , Dr Nie and guys)
and sells this advanced R code to businesses happy to pay ( they are currently paying much more to DR Goodnight and HIS guys)
Why would any sane customer buy it from Revolution- if he could download exactly the same thing from http://r-project.org
Hence the business need for Revolution Analytics to have an enhanced R- as they are using a product based software model not software as a service model.
If Revolution gives away source code of these new enhanced codes to R core team- how will R core team protect the above mentioned intelectual property- given they have 2 decades experience of giving away free code , and back and forth on just code.
Now Revolution also has a marketing budget- and thats how they sponsor some R Core events, conferences, after conference snacks.
How would people decide if they are being too generous or too stingy in their contribution (compared to the formidable generosity of SAS Institute to its employees, stakeholders and even third party analysts).
Would it not be better- IF Revolution can shift that aspect of relationship to its Research and Development budget than it’s marketing budget- come with some sort of incentive for “SOME” developers – even researchers need grants and assistantships, scholarships, make a transparent royalty formula say 17.5 % of the NEW R sales goes to R PACKAGE Developers pool, which in turn examines usage rate of packages and need/merit before allocation- that would require Revolution to evolve from a startup to a more sophisticated corporate and R Core can use this the same way as John M Chambers software award/scholarship
Dont pay all developers- it would be an insult to many of them – say Prof Harrell creator of HMisc to accept – but can Revolution expand its dev base (and prospect for future employees) by even sponsoring some R Scholarships.
And I am sure that if Revolution opens up some more code to the community- they would the rest of the world and it’s help useful. If it cant trust people like R Gentleman with some source code – well he is a board member.
Now to sum up some technical discussions on NeW R
1) An accepted way of benchmarking efficiencies.
2) Code review and incorporation of efficiencies.
3) Multi threading- Multi core usage are trends to be incorporated.
4) GUIs like R Commander E Plugins for other packages, and Rattle for Data Mining to have focussed (or Deducer). This may involve hiring User Interface Designers (like from Apple who will work for love AND money ( Even the Beatles charge royalty for that song)
5) More support to cloud computing initiatives like Biocep and Elastic R – or Amazon AMI for using cloud computers- note efficiency arguements dont matter if you just use a Chrome Browser and pay 2 cents a hour for an Amazon Instance. Probably R core needs more direct involvement of Google (Cloud OS makers) and Amazon as well as even Salesforce.com (for creating Force.com Apps). Note even more corporates here need to be involved as cloud computing doesnot have any free and open source infrastructure (YET)
Debates will come and go. This is an interesting intellectual debate and someday the liitle guys will win the Revolution-
From Hugh M of Gaping Void-
HOW DOES A SOFTWARE COMPANY MAKE MONEY, IF ALL
SOFTWARE IS FREE?
“If something goes wrong with Microsoft, I can phone Microsoft up and have it fixed. With Open Source, I have to rely on the community.”
And the community, as much as we may love it, is unpredictable. It might care about your problem and want to fix it, then again, it may not. Anyone who has ever witnessed something online go “viral”, good or bad, will know what I’m talking about.
Kind of sums up why the open core licensing is all about.
Here are two products that are used widely for Business Intelligence_ They are open source and both have free preview.
Jaspersoft-For the Enterprise version click on the screenshot while for the free community version you can go to
Interestingly (and not surprisingly) Revolution Analytics is teaming up with Jaspersoft to use R for reporting along with the Jaspersoft BI stack.
ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB
FREE WEBINAR WEDNESDAY, SEPTEMBER 22ND @9AM PACIFIC
DEPLOYING R: ADVANCED ANALYTICS ON DEMAND IN APPLICATIONS, IN DASHBOARDS, AND ON THE WEB
A JOINT WEBINAR FROM REVOLUTION ANALYTICS AND JASPERSOFT
Date: Wednesday, September 22, 2010 Time: 9:00am PDT (12:00pm EDT; 4:00pm GMT) Presenters: David Smith, Vice President of Marketing, Revolution Analytics
Andrew Lampitt, Senior Director of Technology Alliances, Jaspersoft
Matthew Dahlman, Business Development Engineer, Jaspersoft
Registration: Click here to register now!
R is a popular and powerful system for creating custom data analysis, statistical models, and data visualizations. But how can you make the results of these R-based computations easily accessible to others? A PhD statistician could use R directly to run the forecasting model on the latest sales data, and email a report on request, but then the process is just going to have to be repeated again next month, even if the model hasn’t changed. Wouldn’t it be better to empower the Sales manager to run the model on demand from within the BI application she already uses—daily, even!—and free up the statistician to build newer, better models for others?
In this webinar, David Smith (VP of Marketing, Revolution Analytics) will introduce the new “RevoDeployR” Web Services framework for Revolution R Enterprise, which is designed to make it easy to integrate dynamic R-based computations into applications for business users. RevoDeployR empowers data analysts working in R to publish R scripts to a server-based installation of Revolution R Enterprise. Application developers can then use the RevoDeployR Web Services API to securely and scalably integrate the results of these scripts into any application, without needing to learn the R language. With RevoDeployR, authorized users of hosted or cloud-based interactive Web applications, desktop applications such as Microsoft Excel, and BI applications like Jaspersoft can all benefit from on-demand analytics and visualizations developed by expert R users.
To demonstrate the power of deploying R-based computations to business users, Andrew Lampitt will introduce Jaspersoft commercial open source business intelligence, the world’s most widely used BI software. In a live demonstration, Matt Dahlman will show how to supercharge the BI process by combining Jaspersoft and Revolution R Enterprise, giving business users on-demand access to advanced forecasts and visualizations developed by expert analysts.
David Smith is the Vice President of Marketing at Revolution Analytics, the leading commercial provider of software and support for the open source “R” statistical computing language. David is the co-author (with Bill Venables) of the official R manual An Introduction to R. He is also the editor of Revolutions (http://blog.revolutionanalytics.com), the leading blog focused on “R” language, and one of the originating developers of ESS: Emacs Speaks Statistics. You can follow David on Twitter as @revodavid.
Andrew Lampitt is Senior Director of Technology Alliances at Jaspersoft. Andrew is responsible for strategic initiatives and partnerships including cloud business intelligence, advanced analytics, and analytic databases. Prior to Jaspersoft, Andrew held other business positions with Sunopsis (Oracle), Business Objects (SAP), and Sybase (SAP). Andrew earned a BS in engineering from the University of Illinois at Urbana Champaign.
Matthew Dahlman is Jaspersoft’s Business Development Engineer, responsible for technical aspects of technology alliances and regional business development. Matt has held a wide range of technical positions including quality assurance, pre-sales, and technical evangelism with enterprise software companies including Sybase, Netonomy (Comverse), and Sunopsis (Oracle). Matt earned a BA in mathematics from Carleton College in Northfield, Minnesota.
The second widely used BI stack in open source is Pentaho.
You can download it here to evaluate it or click on screenshot to read more at
R RANT- while the European R Core leadership led by the Great Dane, Pierre Dalgaard focuses on the small picture and virtually handing the whole commercial side to Prof Nie and David Smith at Revo Computing other smaller package developers have refused to be treated as cheap R and D developers for enterprise software. How’s the book sales coming along, Prof Peter? Any plans to write another R Book or are you done with writing your version of Mathematica (Ref-Newton). Running the R Core project team must be so hard I recommend the Tarantino movie “Inglorious B…” for Herr Doktors. -END
I believe that individual R Package creators like Prof Harell (Hmisc) , or Hadley Wickham (plyr) deserve a share of the royalties or REVENUE that Revolution Computing, or ANY software company that uses R.
On this note-Some updated news on Rattle the Data Mining Tool created by Dr Graham Williams. Once again R development taken ahead by Down Under chaps while the Big Guys thrash out the road map across the Pond.
Data Mining Resources
Rattle is a free and open source data mining toolkit written in the statistical language R using the Gnome graphical interface. It runs under GNU/Linux, Macintosh OS X, and MS/Windows. Rattle is being used in business, government, research and for teaching data mining in Australia and internationally. Rattle can be purchased on DVD (or made available as a downloadable CD image) as a standalone installation for $450USD ($560AUD), using one of the following payment buttons.
The free and open source book, The Data Mining Desktop Survival Guide (ISBN 0-9757109-2-3) simply explains the otherwise complex algorithms and concepts of data mining, with examples to illustrate each algorithm using the statistical language R. The book is being written by Dr Graham Williams, based on his 20 years research and consulting experience in machine learning and data mining. An electronic PDF version is available for a small fee from Togaware ($40AUD/$35USD to cover costs and ongoing development);
- The Data Mining Software Repository makes available a collection of free (as in libre) open source software tools for data mining
- The Data Mining Catalogue lists many of the free and commercial data mining tools that are available on the market.
- The Australasian Data Mining Conferences are supported by Togaware, which also hosts the web site.
- Information about the Pacific Asia Knowledge Discovery and Data Mining series of conferences is also available.
- A Data Mining course is taught at the Australian National University.
- See also the Canberra Analytics Practise Group.
- A Data Mining Course was held at the Harbin Institute of Technology Shenzhen Graduate School, China, 6 December – 13 December 2006. This course introduced the basic concepts and algorithms of data mining from an applications point of view and introduced the use of R and Rattle for data mining in practise.
- A Data Mining Workshop was held over two days at the University of Canberra, 27-28 November, 2006. This course introduced the basic concepts and algorithms for data mining and the use of R and Rattle.
Using R for Data Mining
The open source statistical programming language R (based on S) is in daily use in academia and in business and government. We use R for data mining within the Australian Taxation Office. Rattle is used by those wishing to interact with R through a GUI.
R is memory based so that on 32bit CPUs you are limited to smaller datasets (perhaps 50,000 up to 100,000, depending on what you are doing). Deploying R on 64bit multiple CPU (AMD64) servers running GNU/Linux with 32GB of main memory provides a powerful platform for data mining.
R is open source, thus providing assurance that there will always be the opportunity to fix and tune things that suit our specific needs, rather than rely on having to convince a vendor to fix or tune their product to suit our needs.
Also, by being open source, we can be sure that the code will always be available, unlike some of the data mining products that have disappearded (e.g., IBM’s Intelligent Miner).
See earlier interview-