Parallel Programming using R in Windows

Ashamed at my lack of parallel programming, I decided to learn some R Parallel Programming (after all parallel blogging is not really respect worthy in tech-geek-ninja circles).

So I did the usual Google- CRAN- search like a dog thing only to find some obstacles.

Obstacles-

Some Parallel Programming Packages like doMC are not available in Windows

http://cran.r-project.org/web/packages/doMC/index.html

Some Parallel Programming Packages like doSMP depend on Revolution’s Enterprise R (like –

http://blog.revolutionanalytics.com/2009/07/simple-scalable-parallel-computing-in-r.html

and http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/ (No the latest hack didnt work)

or are in testing like multicore (for Windows) so not available on CRAN

http://cran.r-project.org/web/packages/multicore/index.html

fortunately available on RForge

http://www.rforge.net/multicore/files/

Revolution did make DoSnow AND foreach available on CRAN

see http://blog.revolutionanalytics.com/2009/08/parallel-programming-with-foreach-and-snow.html

but the documentation in SNOW is overwhelming (hint- I use Windows , what does that tell you about my tech acumen)

http://sekhon.berkeley.edu/snow/html/makeCluster.html and

http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html

what is a PVM or MPI? and SOCKS are for wearing or getting lost in washers till I encountered them in SNOW


Finally I did the following-and made the parallel programming work in Windows using R

require(doSNOW)
cl<-makeCluster(2) # I have two cores
registerDoSNOW(cl)
# create a function to run in each itteration of the loop

check <-function(n) {

+ for(i in 1:1000)

+ {

+ sme <- matrix(rnorm(100), 10,10)

+ solve(sme)

+ }

+ }
times <- 100     # times to run the loop
system.time(x <- foreach(j=1:times ) %dopar% check(j))
user  system elapsed
0.16    0.02   19.17
system.time(for(j in 1:times ) x <- check(j))
user  system elapsed</pre>
39.66    0.00   40.46

stopCluster(cl)

And it works!

When China overtook India- using DEDUCER

I was just reading about the new release of World Bank Data at http://www.r-chart.com/2010/09/new-world-bank-data-available.html Now World Bank Data is something I worked with in the past, but the RWDI package is a great package. (see http://www.r-chart.com/2010/09/new-world-bank-data-available.html)

The whole dataset is a 29 mb in zipped CSV though and is available for terrific macroeconomic analysis _ I downloaded it and loaded it instead.

http://data.worldbank.org/sites/default/files/data/wdiandgdf_csv.zip

I took a small subset of the data –


WDI_GDF_Data <- read.table("C:/Documents and Settings/abc/My Documents/Downloads/WDI_GDF_Data.csv",header=T,sep=",",quote="\"")
 WDI_GDF_Data.sub<-subset(WDI_GDF_Data,Country.Code == "CHN" | Country.Code == "IND" | Country.Code == "USA")
WDI_GDF_Data.sub.sub<-subset(WDI_GDF_Data.sub,Series.Code == "NY.GDP.PCAP.KD")
WDI_GDF_Data.sub.sub<-as.data.frame(t(WDI_GDF_Data.sub.sub))
write.csv(WDI_GDF_Data.sub.sub,'C:/Documents and Settings/abc/Desktop/gdp3.csv')

Note- WordPress.com now supports source code in R via http://en.support.wordpress.com/code/posting-source-code/

Now this is basic data manipulation- and I used Deducer for it.

The best thing is the ability to use GGPlot using a GUI.
I am now trying to create more complicated plots for example with more than one Y variable but it is still a work in progress. Overall Deducer has made impressive improvements and with the JGR GUI seems very very promising. The look and feel also shows a combination of features (from SPSS ‘s variable and data view)

And yes China overtook India in 1985. In GDP per capita. Sigh

GGPLot though overtook Excel graphics as well.


Here is a video which is much better than my screenshots

R on Windows HPC Server

From HPC Wire, the newsletter/site for all HPC news-

Source- Link

PALO ALTO, Calif., Sept. 20 — Revolution Analytics, the leading commercial provider of software and support for the popular open source R statistics language, today announced it will deliver Revolution R Enterprise for Microsoft Windows HPC Server 2008 R2, released today, enabling users to analyze very large data sets in high-performance computing environments.

R is a powerful open source statistics language and the modern system for predictive analytics. Revolution Analytics recently introduced RevoScaleR, new “Big Data” analysis capabilities, to its R distribution, Revolution R Enterprise. RevoScaleR solves the performance and capacity limitations of the R language by with parallelized algorithms that stream data across multiple cores on a laptop, workstation or server. Users can now process, visualize and model terabyte-class data sets at top speeds — without the need for specialized hardware.

“Revolution Analytics is pleased to support Microsoft’s Technical Computing initiative, whose efforts will benefit scientists, engineers and data analysts,” said David Champagne, CTO at Revolution. “We believe the engineering we have done for Revolution R Enterprise, in particular our work on big-data statistics and multicore computing, along with Microsoft’s HPC platform for technical computing, makes an ideal combination for high-performance large scale statistical computing.”

“Processing and analyzing this ‘big data’ is essential to better prediction and decision making,” said Bill Hamilton, director of technical computing at Microsoft Corp. “Revolution R Enterprise for Windows HPC Server 2008 R2 gives customers an extremely powerful tool that handles analysis of very large data and high workloads.”

To learn more about Revolution R Enterprise and its Big Data capabilities, download thewhite paper. Revolution Analytics also has an on-demand webcast, “High-performance analytics with Revolution R and Windows HPC Server,” available online.

AND from Microsoft’s website

http://www.microsoft.com/hpc/en/us/solutions/hpc-for-life-sciences.aspx

REvolution R Enterprise »

REvolution Computing

REvolution R Enterprise is designed for both novice and experienced R users looking for a production-grade R distribution to perform mission critical predictive analytics tasks right from the desktop and scale across multiprocessor environments. Featuring RPE™ REvolution’s R Productivity Environment for Windows.

Of course R Enterprise is available on Linux but on Red Hat Enterprise Linux- it would be nice to see Amazom Machine Images as well as Ubuntu versions as well.

An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2.[1]

Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]

The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.

An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.[3]

[edit]Types of images

  • Public: an AMI image that can be used by any one.
  • Paid: a for-pay AMI image that is registered with Amazon DevPay and can be used by any one who subscribes for it. DevPay allows developers to mark-up Amazon’s usage fees and optionally add monthly subscription fees.

Using Facebook Analytics (Updated)

People sceptical of any analytical value of Facebook should see the nice embedded analytics, which is a close rival and even more to Google Analytics for websites. It has recently been updated as well.

It is right there on the button called Insights on left margin of your Facebook Page

Like for the Facebook Page

http://facebook.com/Decisionstats

You can also use Export Data function to run customized analytical and statistical testing on your Corporate Page.

Older View———————————————————————————-

see screenshot of Demographics of 213 Decisionstats fans on Facebook ( FB doesnot allow individual views but only aggregate views for Privacy Reasons)

fb

AsterData gets $30 mill in funding

From the press release, the maker of Map Reduce based BI software gets 30 mill $ as Series C funding. Given the valuation recently by IBM to Netezza, AsterData seems set to cross the Billion Dollar valuation within the next 18-24 months IMO

Aster Data Closes $30 Million Series C Financing

Explosive Growth and Market Leadership Attracts New and Existing Investors

San Carlos, CA – September 22, 2010 – Aster Data, a market leader in big data management and advanced analytics, today announced that it has closed a $30 million Series C round of financing led by both new and existing investors. The company will use the new funding to accelerate growth, scale operations, and expand its global market share in the $20 billion database market – a market that is experiencing rapid growth as a result of both the explosion in data volumes across organizations and the urgent need to deliver a new class of analytics and data-driven applications. The Series C round of funding includes previous investors Sequoia Capital, JAFCO Ventures, Institutional Venture Partners, Cambrian Ventures, as well as an additional new strategic investor.  Also investing in this round is early investor David Cheriton, who previously backed high-growth companies including Google and VMware, and co-founded several successful technology companies.

Today’s Series C funding announcement underscores a year of strong innovation, execution, and overall momentum for the analytic database company. Key milestones include:

Strong sales growth: Since 2008, Aster Data has doubled revenue year-over-year and secured key customers that leverage Aster Data’s platform to address the big data management problem including MySpace, comScore, Barnes & Noble, and Akamai. Like so many organizations today,
Aster Data’s customers are experiencing explosive data growth across their organizations and recognize the need for rich, advanced analytics that give them deeper insights from their data.

Key executive hires: Quentin Gallivan, former CEO of both PivotLink and Postini and EVP of worldwide sales at Verisign, recently joined the company as Chief Executive Officer. In addition, earlier this year, John Calonico, previously at Interwoven, BEA, and Autodesk, joined as Chief Financial Officer; and Nitin Donde, formerly an executive at EMC and 3PAR, joined as Executive Vice President Engineering.  The strength and experience of Aster Data’s management team helps further establish a strong operational foundation for growth in 2010 and beyond.

Industry recognition: Aster Data was positioned in the “Visionaries” Quadrant of Gartner, Inc.’s

Data Warehouse Database Management Systems Magic Quadrant, published 2010 *; was recently named 2011 Tech Pioneer by the World Economic Forum; was named “Company to Watch” in the Information Management category of TechWeb’s Intelligent Enterprise 2010 Editors’ Choice Awards; and was awarded the 2010 San Francisco Business Times Technology and Innovation Award in the Best Product and Services Category.

Product Innovation: Aster Data continues to deliver ground-breaking capabilities to address the big data management and advanced analytics market need. Its recent announcement of
Aster Data nCluster 4.6 includes a column data store, making it the first hybrid row and column MPP DBMS with a unified SQL and MapReduce analytic framework for advanced analytics on large data sets. This year, Aster Data also delivered the most extensive library of pre-packaged MapReduce analytics totaling over 1000 functions, to ease and accelerate delivery of highly advanced analytic applications.

Aster Data’s analytic database, also called a ‘Data-Analytics Server’ is specifically designed to enable organizations to cost effectively store and analyze massive volumes of data. Aster Data leverages the power of commodity, general-purpose hardware, to reduce the cost to scale to support large data volumes and uniquely allows analysis of all data ‘in-database’ enabling richer and faster processing of large data sets. Aster Data’s in-database analytics engine uses the power of MapReduce, a parallel processing framework created by Google.

”The funding we received in our Series C round is a strong endorsement of Aster Data’s market leadership position and the high growth potential of the big data market,” said Quentin Gallivan, Chief Executive Officer, Aster Data. “The Aster Data team has executed exceptionally well to-date and I am excited to have the resources to accelerate the growth of the company as we expand our operations and execute aggressively across all fronts.”

KXEN EMEA User Conference 2010-Success in Business Analytics

KXEN User Conference-Prelim Agenda is out

Source-

http://www.kxen.com/index.php?option=com_content&task=view&id=647&Itemid=1109

THURSDAY, OCTOBER 28, 2010
09:30-10:00 AM Registration & Breakfast

10:00-10:45 AM Welcome & Opening Remarks,
by John Ball, CEO KXEN
10:45-11:30 AM Keynote Session by James Kobielus,
Senior Analyst at Forrester Research, Inc. and author
of “The Forrester WaveTM: Predictive Analytics & Data Mining Solutions, Q1 2010” report 

11:30-12:05 AM Customer Case Study:
The European Commission (Government)
12:05-12:50 PM General Session:
Teradata Advanced Analytics
12:50-02:00 PM Lunch Break & Exhibition
02:00-02:35 PM Customer Case Study: 
Virgin Media
(Communications)
02:35-03:05 PM General Session:
Sponsor Presentation
03:05-03:40 PM
Coffee Break & Exhibition

03:40-04:40 PM General Session:
The Factory Approach to Compete on Analytics
04:40-05:25 PM Customer Case Study: 
Insurance
05:30-06:30 PM Cocktail & Exhibition
07:30-00:00 PM Gala Dinner
FRIDAY, OCTOBER 29, 2010
08:30-09:00 AM
Registration & Breakfast

09:00-10:00 AM Keynote Presentation:
The CTO Talk
10:00-10:30 AM Customer Case Study: 
MonotaRO
(Japan – Retail)
10:30-10:55 AM
Coffee Break & Exhibition

10:55-11:30 AM General Session: 
Sponsor Presentation
11:30-12:05 PM Customer Case Study: 
Aviva
(Poland – Insurance)
12:05-01:00 PM Lunch Break & Exhibition
01:00-01:45 PM General Session: 
How Social Network Analysis Can Boost your Marketing Performance
01:45-02:20 PM Customer Case Study:
Financial Services
02:20-02:45 PM Closing Remarks,
by John Ball, CEO KXEN
02:45-03:00 PM
Coffee Break & Exhibition

Optional: Technical Training (Complimentary to all Attendees)
02:45-04:00 PM Hands-On Training #1: Getting Started with KXEN Analytical Data Management (ADM)
04:00-04:15 PM
Coffee Break

04:15-05:30 PM Hands-On Training #2: Getting Started with KXEN Modeling Factory (KMF)