CloudNumbers.com – #Rstats gets real in the cloud

I came across Cloudnumbers.com . Awesome name , I didnt know groovy domain names existed anymore.

What is cloudnumbers.com – The website which looks like the salesforce.com website in style and design-

says-

Things are still very raw here- but its an awesome concept. With 68 GB of Memory, I am sure R can blow away everything out of the water.

Probably the competition needs to ahem launch that private cloud soon, before they lose the momentum.

and you Get 2GB of storage, 2GB of traffic and 10h computation cost per month for free! I think this German startup has hit the nail on the head and it would be interesting to see what the future holds.

 

Check out http://cloudnumbers.com/product yourself and/or see the video

https://www.youtube.com/v/0ZNEpR_ElV0?version=3&hl=en_US&hd=1

 

Lovely forecasting blog

Eight different random walks.
Image via Wikipedia

I really loved this simple, smart and yet elegant explanation of forecasting. even a high school quarterback could understand it, and maybe get a internship job building and running and re running code for Mars shot.

Despite my plea that you remain svelte in real life, I implore you to be naïve in business forecasting – and use a naïve forecasting model early and often. A naïve forecasting model is the most important model you will ever use in business forecasting.

and now the killer line

Purists may argue that the only true naïve forecast is the “no-change” forecast, meaning either a random walk (forecast = last known actual) or a seasonal random walk (e.g. forecast = actual from corresponding period last year). These are referred to as NF1 and NF2 in the Makridakis text (where NF = Naïve Forecast). In our 2006 SAS webseries Finding Flaws in Forecasting, an attendee asked “What about using a simple time series forecast with no intervention as the naïve forecast?” Is that allowed?

i did write a blog article on forecasting some time back, but back then I was a little blogger, with the website name being http://iwannacrib.com

great work in helping make forecasting easier to understand for people who have flower shops and dont have a bee, to help them with the forecasts, nor an geeky email list, not 4000$.

make it easier for the little guy to forecast his sales, so he cuts down on his supply chain inventory, lowering his carbon footprint.

Blog.sas.com take a bow, on labour day, helping workers with easy to understand models.

http://blogs.sas.com/forecasting/index.php?/archives/68-Which-Naive-Model-to-Use.html

Cloud Computing with R

Illusion of Depth and Space (4/22) - Rotating ...
Image by Dominic's pics via Flickr

Here is a short list of resources and material I put together as starting points for R and Cloud Computing It’s a bit messy but overall should serve quite comprehensively.

Cloud computing is a commonly used expression to imply a generational change in computing from desktop-servers to remote and massive computing connections,shared computers, enabled by high bandwidth across the internet.

As per the National Institute of Standards and Technology Definition,
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

(Citation: The NIST Definition of Cloud Computing

Authors: Peter Mell and Tim Grance
Version 15, 10-7-09
National Institute of Standards and Technology, Information Technology Laboratory
http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc)

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

From http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

R Web Interfaces

Rweb is developed and maintained by Jeff Banfield. The Rweb Home Page provides access to all three versions of Rweb—a simple text entry form that returns output and graphs, a more sophisticated JavaScript version that provides a multiple window environment, and a set of point and click modules that are useful for introductory statistics courses and require no knowledge of the R language. All of the Rweb versions can analyze Web accessible datasets if a URL is provided.
The paper “Rweb: Web-based Statistical Analysis”, providing a detailed explanation of the different versions of Rweb and an overview of how Rweb works, was published in the Journal of Statistical Software (http://www.jstatsoft.org/v04/i01/).

Ulf Bartel has developed R-Online, a simple on-line programming environment for R which intends to make the first steps in statistical programming with R (especially with time series) as easy as possible. There is no need for a local installation since the only requirement for the user is a JavaScript capable browser. See http://osvisions.com/r-online/ for more information.

Rcgi is a CGI WWW interface to R by MJ Ray. It had the ability to use “embedded code”: you could mix user input and code, allowing the HTMLauthor to do anything from load in data sets to enter most of the commands for users without writing CGI scripts. Graphical output was possible in PostScript or GIF formats and the executed code was presented to the user for revision. However, it is not clear if the project is still active.

Currently, a modified version of Rcgi by Mai Zhou (actually, two versions: one with (bitmap) graphics and one without) as well as the original code are available from http://www.ms.uky.edu/~statweb/.

CGI-based web access to R is also provided at http://hermes.sdu.dk/cgi-bin/go/. There are many additional examples of web interfaces to R which basically allow to submit R code to a remote server, see for example the collection of links available from http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/StatCompCourse.

David Firth has written CGIwithR, an R add-on package available from CRAN. It provides some simple extensions to R to facilitate running R scripts through the CGI interface to a web server, and allows submission of data using both GET and POST methods. It is easily installed using Apache under Linux and in principle should run on any platform that supports R and a web server provided that the installer has the necessary security permissions. David’s paper “CGIwithR: Facilities for Processing Web Forms Using R” was published in the Journal of Statistical Software (http://www.jstatsoft.org/v08/i10/). The package is now maintained by Duncan Temple Lang and has a web page athttp://www.omegahat.org/CGIwithR/.

Rpad, developed and actively maintained by Tom Short, provides a sophisticated environment which combines some of the features of the previous approaches with quite a bit of JavaScript, allowing for a GUI-like behavior (with sortable tables, clickable graphics, editable output), etc.
Jeff Horner is working on the R/Apache Integration Project which embeds the R interpreter inside Apache 2 (and beyond). A tutorial and presentation are available from the project web page at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RApacheProject.

Rserve is a project actively developed by Simon Urbanek. It implements a TCP/IP server which allows other programs to use facilities of R. Clients are available from the web site for Java and C++ (and could be written for other languages that support TCP/IP sockets).

OpenStatServer is being developed by a team lead by Greg Warnes; it aims “to provide clean access to computational modules defined in a variety of computational environments (R, SAS, Matlab, etc) via a single well-defined client interface” and to turn computational services into web services.

Two projects use PHP to provide a web interface to R. R_PHP_Online by Steve Chen (though it is unclear if this project is still active) is somewhat similar to the above Rcgi and Rweb. R-php is actively developed by Alfredo Pontillo and Angelo Mineo and provides both a web interface to R and a set of pre-specified analyses that need no R code input.

webbioc is “an integrated web interface for doing microarray analysis using several of the Bioconductor packages” and is designed to be installed at local sites as a shared computing resource.

Rwui is a web application to create user-friendly web interfaces for R scripts. All code for the web interface is created automatically. There is no need for the user to do any extra scripting or learn any new scripting techniques. Rwui can also be found at http://rwui.cryst.bbk.ac.uk.

Finally, the R.rsp package by Henrik Bengtsson introduces “R Server Pages”. Analogous to Java Server Pages, an R server page is typically HTMLwith embedded R code that gets evaluated when the page is requested. The package includes an internal cross-platform HTTP server implemented in Tcl, so provides a good framework for including web-based user interfaces in packages. The approach is similar to the use of the brew package withRapache with the advantage of cross-platform support and easy installation.

Also additional R Cloud Computing Use Cases
http://wwwdev.ebi.ac.uk/Tools/rcloud/

ArrayExpress R/Bioconductor Workbench

Remote access to R/Bioconductor on EBI’s 64-bit Linux Cluster

Start the workbench by downloading the package for your operating system (Macintosh or Windows), or via Java Web Start, and you will get access to an instance of R running on one of EBI’s powerful machines. You can install additional packages, upload your own data, work with graphics and collaborate with colleagues, all as if you are running R locally, but unlimited by your machine’s memory, processor or data storage capacity.

  • Most up-to-date R version built for multicore CPUs
  • Access to all Bioconductor packages
  • Access to our computing infrastructure
  • Fast access to data stored in EBI’s repositories (e.g., public microarray data in ArrayExpress)

Using R Google Docs
http://www.omegahat.org/RGoogleDocs/run.pdf
It uses the XML and RCurl packages and illustrates that it is relatively quick and easy
to use their primitives to interact with Web services.

Using R with Amazon
Citation
http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/

Amazon’s EC2 is a type of cloud that provides on demand computing infrastructures called an Amazon Machine Images or AMIs. In general, these types of cloud provide several benefits:

  • Simple and convenient to use. An AMI contains your applications, libraries, data and all associated configuration settings. You simply access it. You don’t need to configure it. This applies not only to applications like R, but also can include any third-party data that you require.
  • On-demand availability. AMIs are available over the Internet whenever you need them. You can configure the AMIs yourself without involving the service provider. You don’t need to order any hardware and set it up.
  • Elastic access. With elastic access, you can rapidly provision and access the additional resources you need. Again, no human intervention from the service provider is required. This type of elastic capacity can be used to handle surge requirements when you might need many machines for a short time in order to complete a computation.
  • Pay per use. The cost of 1 AMI for 100 hours and 100 AMI for 1 hour is the same. With pay per use pricing, which is sometimes called utility pricing, you simply pay for the resources that you use.

Connecting to R on Amazon EC2- Detailed tutorials
Ubuntu Linux version
https://decisionstats.com/2010/09/25/running-r-on-amazon-ec2/
and Windows R version
https://decisionstats.com/2010/10/02/running-r-on-amazon-ec2-windows/

Connecting R to Data on Google Storage and Computing on Google Prediction API
https://github.com/onertipaday/predictionapirwrapper
R wrapper for working with Google Prediction API

This package consists in a bunch of functions allowing the user to test Google Prediction API from R.
It requires the user to have access to both Google Storage for Developers and Google Prediction API:
see
http://code.google.com/apis/storage/ and http://code.google.com/apis/predict/ for details.

Example usage:

#This example requires you had previously created a bucket named data_language on your Google Storage and you had uploaded a CSV file named language_id.txt (your data) into this bucket – see for details
library(predictionapirwrapper)

and Elastic R for Cloud Computing
http://user2010.org/tutorials/Chine.html

Abstract

Elastic-R is a new portal built using the Biocep-R platform. It enables statisticians, computational scientists, financial analysts, educators and students to use cloud resources seamlessly; to work with R engines and use their full capabilities from within simple browsers; to collaborate, share and reuse functions, algorithms, user interfaces, R sessions, servers; and to perform elastic distributed computing with any number of virtual machines to solve computationally intensive problems.
Also see Karim Chine’s http://biocep-distrib.r-forge.r-project.org/

R for Salesforce.com

At the point of writing this, there seem to be zero R based apps on Salesforce.com This could be a big opportunity for developers as both Apex and R have similar structures Developers could write free code in R and charge for their translated version in Apex on Salesforce.com

Force.com and Salesforce have many (1009) apps at
http://sites.force.com/appexchange/home for cloud computing for
businesses, but very few forecasting and statistical simulation apps.

Example of Monte Carlo based app is here
http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016cT9EAI#

These are like iPhone apps except meant for business purposes (I am
unaware if any university is offering salesforce.com integration
though google apps and amazon related research seems to be on)

Force.com uses a language called Apex  and you can see
http://wiki.developerforce.com/index.php/App_Logic and
http://wiki.developerforce.com/index.php/An_Introduction_to_Formulas
Apex is similar to R in that is OOPs

SAS Institute has an existing product for taking in Salesforce.com data.

A new SAS data surveyor is
available to access data from the Customer Relationship Management
(CRM) software vendor Salesforce.com. at
http://support.sas.com/documentation/cdl/en/whatsnew/62580/HTML/default/viewer.htm#datasurveyorwhatsnew902.htm)

Personal Note-Mentioning SAS in an email to a R list is a big no-no in terms of getting a response and love. Same for being careless about which R help list to email (like R devel or R packages or R help)

For python based cloud see http://pi-cloud.com

Making NeW R

Tal G in his excellent blog piece talks of “Why R Developers  should not be paid” http://www.r-statistics.com/2010/09/open-source-and-money-why-r-developers-shouldnt-be-paid/

His argument of love is not very original though it was first made by these four guys

I am going to argue that “some” R developers should be paid, while the main focus should be volunteers code. These R developers should be paid as per usage of their packages.

Let me expand.

Imagine the following conversation between Ross Ihaka, Norman Nie and Peter Dalgaard.

Norman- Hey Guys, Can you give me some code- I got this new startup.

Ross Ihaka and Peter Dalgaard- Sure dude. Here is 100,000 lines of code, 2000 packages and 2 decades of effort.

Norman- Thanks guys.

Ross Ihaka- Hey, What you gonna do with this code.

Norman- I will better it. Sell it. Finally beat Jim Goodnight and his **** Proc GLM and **** Proc Reg.

Ross- Okay, but what will you give us? Will you give us some code back of what you improve?

Norman – Uh, let me explain this open core …

Peter D- Well how about some royalty?

Norman- Sure, we will throw parties at all conferences, snacks you know at user groups.

Ross – Hmm. That does not sound fair. (walks away in a huff muttering)-He takes our code, sells it and wont share the code

Peter D- Doesnt sound fair. I am back to reading Hamlet, the great Dane, and writing the next edition of my book. I am glad I wrote a book- Ross didnt even write that.

Norman-Uh Oh. (picks his phone)- Hey David Smith, We need to write some blog articles pronto – these open source guys ,man…

———–I think that sums what has been going on in the dynamics of R recently. If Ross Ihaka and R Gentleman had adopted an open core strategy- meaning you can create packages to R but not share the original where would we all be?

At this point if he is reading this, David Smith , long suffering veteran of open source  flameouts is rolling his eyes while Tal G is wondering if he will publish this on R Bloggers and if so when or something.

Lets bring in another R veteran-  Hadley Wickham who wrote a book on R and also created ggplot. Thats the best quality, most often used graphics package.

In terms of economic utilty to end user- the ggplot package may be as useful if not more as the foreach package developed by Revolution Computing/Analytics.

Now http://cran.r-project.org/web/packages/foreach/index.html says that foreach is licensed under http://www.apache.org/licenses/LICENSE-2.0

However lets come to open core licensing ( read it here http://alampitt.typepad.com/lampitt_or_leave_it/2008/08/open-core-licen.html ) which is where the debate is- Revolution takes code- enhances it (in my opinion) substantially with new formats XDF for better efficieny, web services API, and soon coming next year a GUI (thanks in advance , Dr Nie and guys)

and sells this advanced R code to businesses happy to pay ( they are currently paying much more to DR Goodnight and HIS guys)

Why would any sane customer buy it from Revolution- if he could download exactly the same thing from http://r-project.org

Hence the business need for Revolution Analytics to have an enhanced R- as they are using a product based software model not software as a service model.

If Revolution gives away source code of these new enhanced codes to R core team- how will R core team protect the above mentioned intelectual property- given they have 2 decades experience of giving away free code , and back and forth on just code.

Now Revolution also has a marketing budget- and thats how they sponsor some R Core events, conferences, after conference snacks.

How would people decide if they are being too generous or too stingy in their contribution (compared to the formidable generosity of SAS Institute to its employees, stakeholders and even third party analysts).

Would it not be better- IF Revolution can shift that aspect of relationship to its Research and Development budget than it’s marketing budget- come with some sort of incentive for “SOME” developers – even researchers need grants and assistantships, scholarships, make a transparent royalty formula say 17.5 % of the NEW R sales goes to R PACKAGE Developers pool, which in turn examines usage rate of packages and need/merit before allocation- that would require Revolution to evolve from a startup to a more sophisticated corporate and R Core can use this the same way as John M Chambers software award/scholarship

Dont pay all developers- it would be an insult to many of them – say Prof Harrell creator of HMisc to accept – but can Revolution expand its dev base (and prospect for future employees) by even sponsoring some R Scholarships.

And I am sure that if Revolution opens up some more code to the community- they would the rest of the world and it’s help useful. If it cant trust people like R Gentleman with some source code – well he is a board member.

——————————————————————————————–

Now to sum up some technical discussions on NeW R

1)  An accepted way of benchmarking efficiencies.

2) Code review and incorporation of efficiencies.

3) Multi threading- Multi core usage are trends to be incorporated.

4) GUIs like R Commander E Plugins for other packages, and Rattle for Data Mining to have focussed (or Deducer). This may involve hiring User Interface Designers (like from Apple 😉  who will work for love AND money ( Even the Beatles charge royalty for that song)

5) More support to cloud computing initiatives like Biocep and Elastic R – or Amazon AMI for using cloud computers- note efficiency arguements dont matter if you just use a Chrome Browser and pay 2 cents a hour for an Amazon Instance. Probably R core needs more direct involvement of Google (Cloud OS makers) and Amazon as well as even Salesforce.com (for creating Force.com Apps). Note even more corporates here need to be involved as cloud computing doesnot have any free and open source infrastructure (YET)

_______________________________________________________

Debates will come and go. This is an interesting intellectual debate and someday the liitle guys will win the Revolution-

From Hugh M of Gaping Void-

http://www.gapingvoid.com/Moveable_Type/archives/cat_microsoft_blue_monster_series.html

HOW DOES A SOFTWARE COMPANY MAKE MONEY, IF ALL

SOFTWARE IS FREE?

“If something goes wrong with Microsoft, I can phone Microsoft up and have it fixed. With Open Source, I have to rely on the community.”

And the community, as much as we may love it, is unpredictable. It might care about your problem and want to fix it, then again, it may not. Anyone who has ever witnessed something online go “viral”, good or bad, will know what I’m talking about.

and especially-

http://gapingvoid.com/2007/04/16/how-well-does-open-source-currently-meet-the-needs-of-shareholders-and-ceos/

Source-http://gapingvoidgallery.com/

Kind of sums up why the open core licensing is all about.

Trrrouble in land of R…and Open Source Suggestions

Recently some comments by Ross Ihake , founder of R Statistical Software on Revolution Analytics, leading commercial vendor of R….. came to my attention-

http://www.stat.auckland.ac.nz/mail/archive/r-downunder/2010-May/000529.html

[R-downunder] Article on Revolution Analytics

Ross Ihaka ihaka at stat.auckland.ac.nz
Mon May 10 14:27:42 NZST 2010


On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand
and from http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/
Open source purists probably won't be all too happy to learn that Revolution is going to be employing an "open core" strategy, which means the core R programs will remain open source and be given tech support under a license model, but the key add-ons that make R more scalable will be closed source and sold under a separate license fee. Because most of those 2,500 add-ons for R were built by academics and Revolution wants to supplant SPSS and SAS as the tools used by students, Revolution will be giving the full single-user version of the R Enterprise stack away for free to academics. 
Conclusion-
So one co-founder of R is advocating not to buy from Revolution Analytics , which has the other co-founder of R, Gentleman on its board. 
Source- http://www.revolutionanalytics.com/aboutus/leadership.php

2) If Revolution Analytics is using 2500 packages for free but insisting on getting paid AND closing source of it’s packages (which is a technical point- how exactly can you prevent source code of a R package from being seen)

Maybe there can be a PACKAGE marketplace just like Android Apps, Facebook Apps, and Salesforce.com Apps – so atleast some of the thousands of R package developers can earn – sorry but email lists do not pay mortgages and no one is disputing the NEED for commercializing R or rewarding developers.

Though Barr created SAS, he gave up control to Goodnight and Sall https://decisionstats.wordpress.com/2010/06/02/sas-early-days/

and Goodnight and Sall do pay their developers well- to the envy of not so well paid counterparts.

3) I really liked the innovation of Revolution Analytics RevoScalar, and I wish that the default R dataset be converted to XDF dataset so that it basically kills

off the R criticism of being slow on bigger datasets. But I also realize the need for creating an analytics marketplace for R developers and R students- so academic version of R being free and Revolution R being paid seems like a trade off.

Note- You can still get a job faster as a stats student if you mention SAS and not R as a statistical skill- not all stats students go into academics.

4) There can be more elegant ways of handling this than calling for ignoring each other as REVOLUTION and Ihake seem to be doing to each other.

I can almost hear people in Cary, NC chuckling at Norman Nie, long time SPSS opponent and now REVOLUTION CEO, and his antagonizing R’s academicians within 1 year of taking over- so I hope this ends well for all. The road to hell is paved with good intentions- so if REVOLUTION can share some source code with say R Core members (even Microsoft shares source code with partners)- and R Core and Revolution agree on a licensing royalty from each other, they can actually speed up R package creation rather than allow this 2 decade effort to end up like S and S plus and TIBCO did.

Maybe Richard Stallman can help-or maybe Ihaka has a better sense of where things will go down in a couple of years-he must know something-he invented it, didnt he

On 09/05/10 09:52, Murray Jorgensen wrote:
> Perhaps of interest:
>
> http://www.theregister.co.uk/2010/05/06/revolution_commercial_r/

Please note that R is "free software" not "open source".  These guys
are selling a GPLed work without disclosing the source to their part
of the work. I have complained to them and so far they have given me
the brush off. I am now considering my options.

Don't support these guys by buying their product. The are not feeding
back to the rights holders (the University of Auckland and I are rights
holders and they didn't even have the courtesy to contact us).

--
Ross Ihaka                         Email:  ihaka at stat.auckland.ac.nz
Department of Statistics           Phone:  (64-9) 373-7599 x 85054
University of Auckland             Fax:    (64-9) 373-7018
Private Bag 92019, Auckland
New Zealand

Analytics and BI for small biz

I saw a story on Warren B and Goldman S creating a 500$ million pool for small business owners.

  • The program will contribute $200 million to community colleges, universities and other institutions to provide small- business owners with practical business education.

  • Goldman Sachs repaid the $10 billion it was given last year under the taxpayer-funded Troubled Asset Relief Program, plus dividends. The firm continues to benefit from federal guarantees on about $21 billion of long-term debt.

  • Buffett, known as the “Oracle of Omaha” for his investing prowess, is the second-richest American. Berkshire, which invests in companies ranging from retailers to insurers, paid $5 billion in September 2008 to acquire preferred stock in Goldman Sachs that pays a 10 percent dividend. Berkshire, based in Omaha, Nebraska, also gained five-year warrants to buy $5 billion of common stock at $115 per share.

  • ( NOTE Curent Price of GS shares is 172$ – thats a 50% profit on 5 Billion~ 2.5 Billion for Mr Buffett but he is probably waiting for long term capital gains ax rates to kick in before encashing his patriotic  “Buy American. I am” warrants (see NYT op ed by him  http://www.nytimes.com/2008/10/17/opinion/17buffett.html )
  • A better analysis of the above Bloomberg story was given on Bloomberg itself at http://www.bloomberg.com/apps/news?pid=20601039&sid=asjp51YPDwJU
  • A small thought- could smaller businesses gain from efficiencies of programs like SPSS, SAS and R. Or would they be better off with customized GUI’s linked to their POS data.

Anyways a need for analytics for small businesses in inventory management, and sales planning could help. Joe the Plumber could do with some ETS and Regression Models as well.

However apart for Salesforce.com applications this field seems to be totally vacant for analytics. What are IBM SPSS, SAS, or even other stats packages doing for small businesses. or even developing Salesforce.com applications for their own equivalent software

The market could be an interesting one to atleast do a test in. Unless you don’t believe in test and control.

See below the IBM Cognos by IBM itself and the third party app by Pervasive for SAP Integration-

Citation-

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016YGYEA2

and

http://sites.force.com/appexchange/listingDetail?listingId=a0N300000016am1EAA