R and Cloud Computing

Here is a good site for using R for cloud computing. It is called Biocep.

http://biocep-distrib.r-forge.r-project.org/

    Biocep is a general unified open source Java solution for integrating and virtualizing the access to R engines/servers. It aims to become a federative user-friendly computational e-platform for research, finance and education. The Biocep virtual workbench provides a framework enabling the connection of all the elements of a computational environment:

    • 1. The computational resource (whether it is a local machine, a cluster, a grid or a cloud server) via a simple URL.
    • 2. The computational components via the import of R packages.
    • 3. The GUIs via the import of plugins from repositories or the design of new views with a drag-and-drop GUI editor.

An example is

    A Biocep based R virtualization infrastructure has been successfully deployed on the British National Grid Service, demonstrating its usability and usefulness for researchers.

     

     

An additional package is RWebservices.

http://www.bioconductor.org/packages/2.3/bioc/html/RWebServices.html

    Expose R functions as web services through Java/Axis/Apache

    This package provides mechanisms for automatic function prototyping and exposure of R functionality in a web services environment.

Using R on a cloud computer effectively cuts down on hardware AND software license costs to less than a 1$ per hour even for extremely intensive analytics work.A separate and generic framework for this is the conceptual idea self deprecatingly called Ohri Framework (read here) . Since I lack both the money and the time , I have been trying to evangelize R to the cloud community and the cloud to the analytics community since last year. Watch this space – this action is heating up.

India:Bungalow Dogs bark back at Slumdogs

The recent Oscar Nomination and subsequent debate for “ Slumdog Millionaire” is both astonishing and disturbing.

While it is astonishing

as the first major Oscar nominated English movie on India in thirty years since “Gandhi’ (made by another British director) ,

it is disturbing

as it reflects the inherent tensions in a nation of a 1.147 billion people racing to the moon in unmanned orbit(2008) ( and thus proud of its recent achievements including economic, sports, and political

while at the same time

a nation state struggling to provide basics of food, housing ,employment (conservatively 300 million people live on less than 1 dollar a day in India) and lack of state safety from terror attacks.

Most of the critics decrying the exhibitionism of “poverty porn” , a unique term, are themselves safely far removed from the slums themselves.

An average Indian Middle class family  earns 1-2000 USD a month ( depending on how good the economy is),

I find the rows of people sleeping on pavements and defecating openly – both embarrassing and humbling.

There has to be some shame, some morality in an economic system where the urban middle class  earns more than 60 times than the urban poor ( or 60 times 30 dollars a month) .

Yet

 I find the lack of conscience ( we cant help them, so lets help ourselves)

in my peers ,

fellow middle class chaps and

especially the intellectual classes of academia and corporates

 

and their

hubris and pride in the inevitable rise of power to a glorious Mother India ,

an amusing and sometimes puzzling drama which is as entertaining as any fictional movie created by a global Holly or a local Bolly – Wood

 

http://en.wikipedia.org/wiki/Poverty_in_India

Vote for the SAS-L Rookie of the Year

If you are on the SAS-L list, you can vote for the following

 

SAS-L Rookie of the Year (SASLROY)


Scott Bucher
Joe Matise
Akshaya Nathilvar
Ajay Ohri                        (This is me…..by the way)
Karma Tarap

You can vote (one vote per person please) at:
http://ires.ku.edu/~ipsr/SGF2009/saslbof.htm
Voting will end February 12th.

And, as usual, the winners will be announced at the annual SAS-L BOF, at SAS Global Forum:
When: Monday, March 23
Where: TBA
Time: 7-8 pm

Ps-I wonder if the R –Help list has something  like this.

Twitter Mayhem

 imageI was trying to send a message to all the Decision Stats group on Linkedin for joining me on twitter. Unfortunately it showed me a message – too many tweets try later. I did try later. 6 times.

Same message TRY LATER.

And twitter send 6 emails to all 570 people. Many apologies for this- I was not spamming ,but it ended up like this.

I just downloaded R Comps latest release of REvolution R. The individual Win 32 version is free, while Enterprise version with Win 64 versions. Tech support is included in services contract for the software which should help with any corporate willing to take R on a trial basis.

 

From the press release ,

REvolution Computing Makes High Performance REvolution R

Available For Download

New Haven, CT January 28, 2009 REvolution Computing, a leading provider of open source predictive analytics solutions, today announced that it has made a public version of its commercial grade REvolution R program available for download from its website. REvolution R is REvolution Computings distribution of the popular R statistical software, optimized for use in commercial environments.

With the latest release of REvolution R, REvolution Computing has added significant performance enhancements to the base system, which can prove to be of great value in both commercial and research settings. A key feature includes the use of powerful optimized libraries capable of boosting performance by a factor of 5 or 10 for commonly used operations. In addition, REvolution R has been put through a quality process designed to meet regulatory agency audit standards, making the subscription version reliable for use in mission critical research and production.

In making our latest release of REvolution R available for download, REvolution Computing is providing all R users the ability to take advantage of optimized and validated software previously available only to commercial users, said REvolution Computing CEO, Richard Schultz. In a true commercial open source way, we have reached the point in our development that we are able to offer significant value to both sets of our community users REvolution R for all users, and REvolution R Enterprise, with additional commercial-grade capabilities and support, available by annual subscription.

REvolutions commercial distribution, REvolution R Enterprise, features advanced functionality, including ParallelR, which speeds deployment across both multiprocessor workstations and clusters to enable the same codes to be used for prototyping and production. REvolution R Enterprise is functional with 64-bit platforms and Linux enterprise platforms and provides for telephone support and response guarantees.

Some background on the company itself ..from the company itself-

 

About REvolution Computing

New Haven, Connecticut-based REvolution Computing is the leading commercial provider of software and support for the statistical computing language known as R. 

Our products, including REvolution R and REvolution R Enterprise, enable statisticians, scientists and others to create superior predictive models and derive meaning from large sets of mission-critical data in record time. REvolution Computing

 

works closely with the R community to incorporate the latest developments in open source R, and with our clients to support their efforts to produce groundbreaking innovations in life sciences, financial services, defense technology and other industries where high-level analytics are crucial to success. At REvolution Computing, We do the math.

The product names RPro, ParallelR, REvolution R, and REvolution R Enterprise, are trademarks of REvolution Computing.

 

This basically gives the company first mover

advantage in commercial R. The timing is also fortunate as companies across the world look to cut costs (unfortunately labor costs are being cut faster than software costs) as well as move beyond traditional analytics softwares that performed ah so well in the sub prime prediction market.

REvolution R is available for download on Windows and Intel MacOS X, both in 32-bit mode at http://www.revolution-computing.com/downloads/revolution-r.php

Revolution Computing Releases Commercial R –The Analytics Market just grew better

I just downloaded R Comp’s latest release of REvolution R. The individual Win 32 version is free, while Enterprise version with Win 64 versions. Tech support is included in services contract for the software which should help with any corporate willing to take R on a trial basis.

 

From the press release ,

REvolution Computing Makes High Performance ‘REvolution R’

Available For Download

New Haven, CT – January 28, 2009 – REvolution Computing, a leading provider of open source predictive analytics solutions, today announced that it has made a public version of its commercial grade REvolution R program available for download from its website. REvolution R is REvolution Computing’s distribution of the popular R statistical software, optimized for use in commercial environments.

With the latest release of REvolution R, REvolution Computing has added significant performance enhancements to the base system, which can prove to be of great value in both commercial and research settings. A key feature includes the use of powerful optimized libraries capable of boosting performance by a factor of 5 or 10 for commonly used operations. In addition, REvolution R has been put through a quality process designed to meet regulatory agency audit standards, making the subscription version reliable for use in mission critical research and production.

“In making our latest release of REvolution R available for download, REvolution Computing is providing all R users the ability to take advantage of optimized and validated software previously available only to commercial users,” said REvolution Computing CEO, Richard Schultz. “In a true commercial open source way, we have reached the point in our development that we are able to offer significant value to both sets of our community users – REvolution R for all users, and REvolution R Enterprise, with additional commercial-grade capabilities and support, available by annual subscription.”

REvolution’s commercial distribution, REvolution R Enterprise, features advanced functionality, including ParallelR, which speeds deployment across both multiprocessor workstations and clusters to enable the same codes to be used for prototyping and production. REvolution R Enterprise is functional with 64-bit platforms and Linux enterprise platforms and provides for telephone support and response guarantees.

Some background on the company itself ………..from the company itself-

 

About REvolution Computing

New Haven, Connecticut-based REvolution Computing is the leading commercial provider of software and support for the statistical computing language known as “R.” 

Our products, including REvolution R and REvolution R Enterprise, enable statisticians, scientists and others to create superior predictive models and derive meaning from large sets of mission-critical data in record time. REvolution Computing

 

works closely with the R community to incorporate the latest developments in open source R, and with our clients to support their efforts to produce groundbreaking innovations in life sciences, financial services, defense technology and other industries where high-level analytics are crucial to success. At REvolution Computing, “We do the math.”

The product names “RPro,” “ParallelR,” “REvolution R,” and “REvolution R Enterprise,” are trademarks of REvolution Computing.

 

This basically gives the company first mover

advantage in commercial R. The timing is also fortunate as companies across the world look to cut costs (unfortunately labor costs are being cut faster than software costs) as well as move beyond traditional analytics softwares that performed ah so well in the sub prime prediction market.

REvolution R is available for download on Windows and Intel MacOS X, both in 32-bit mode at http://www.revolution-computing.com/downloads/revolution-r.php

Using Google Docs for Web Scraping

While trying to scrape some data from a Website , I chanced upon the getXML function which is pretty neat, as it basically allows you to import the XML feed of a webpage and then parse the data appropriately.

 

Here is an example-

 

Using the getXML function I parsed all links for “analytics consultant in India” search results in Google.

The GetXML function works as follows (from the support page here )

Functions:

=importXML("URL","query")

  • URL – the URL of the XML or HTML file
  • query – the XPath query to run on the data given at the URL. For example, "//a/@href" returns a list of the href attributes of all <a> tags in the document (i.e. all of the URLs the document links to). For more information about XPath, please visithttp://www.w3schools.com/xpath/
  • Example: =importXml("www.google.com", "//a/@href"). This returns all of the href attributes (the link URLs) in all the <a> tags on www.google.com home page

 

You can see it here-

http://spreadsheets.google.com/pub?key=pS9vSxWuwOllXHdueY0TDdg

or Using the Embed Function