work – Page 20 – DECISION STATS

Clustering Business Analysts and Industry Analysts

In my interactions with the world at large (mostly online) in the ways of data, statistics and analytics- I come across people who like to call themselves analysts.

As per me, there are 4 kinds of analysts principally,

1) Corporate Analysts- They work for a particular software company. As per them their product is great and infallible, their code has no bugs, and last zillion customer case studies all got a big benefit by buying their software.

They are very good at writing software code themselves, unfortunately this expertise is restricted to Microsoft Outlook (emails) and MS Powerpoint ( presentations). No they are more like salesmen than analysts, but as Arthur Miller said ” All salesmen (person) are dreamers. When the dream dies, the salesman (person) dies (read transfers to bigger job at a rival company)

2) Third -Party Independent Analsyst- The main reason they are third party is they can not be tolerated in a normal corporate culture, their spouse can barely stand them for more than 2 hours a day, and their Intelligence is not matched by their emotional maturity. Alas, after turning independent analysts, they realize they are actually more dependent to people than before, and they quickly polish their behaviour to praise who ever is sponsoring their webinar, white paper , newsletter, or flying them to junkets. They are more of boutique consultants, but they used to be quite nifty at writing code, when younger, so they call themselves independent and “Noted Industry Analyst”

3) Researcher Analysts- They mostly scrape info from press releases which are mostly written by a hapless overworked communications team thrown at a task at last moment. They get into one hour call with who ever is the press or industry/analyst relations honcho is- turn the press release into bullet points, and publish on the blog. They call this as research Analysts and give it away for free (but actually couldnt get anyone to pay for it for last 4 years). Couldnt write code if their life depended on it, but usually will find transformation and expert somehwere in their resume/about me web page. May have co -authored a book, which would have gotten them a F for plagiarism had they submitted it as a thesis.

4) Analytical Analysts- They are mostly buried deep within organizational bureaucracies if corporate, or within partnerships if they are independent. Understand coding, innovation (or creativity). Not very aggressive at networking unless provoked by an absolute idiot belonging to first three classes of industry analyst. Prefer to read Atlas Shrugged than argue on business semantics.

Next time you see an industry expert- you know which cluster to classify them 😉

Image Citation-

http://gapingvoidgallery.com/

Running R on Amazon EC2 :Windows

Running R on an Amazon EC2 has following benefits-

1) Elastic Memory and Number of Processors for heavy computation
2) Affordable micro instances for smaller datasets (2 cents per hour for Unix to 3 cents per hour).
3) An easy to use interface console for managing datasets as well as processes

Running R on an Amazon EC2 on Windows Instance has following additional benefits-

1) Remote Desktop makes operation of R very easy
2) 64 Bit R can be used
3) You can also use your evaluation of Revolution R Enterprise (which is free to academics) and quite inexpensive for enterprise software for corporates.

You can thus combine R GUIs (like Rattle , R Cmdr or Deducer based upon your need for statistical analysis, data mining or graphical analysis) , with 64 Bit OS, and Revolution’s REvoScaler Package to manage huge huge datasets at a very easy to use analytics solution.

Pricing-for Computation on EC2

Standard On-Demand Instances	Linux/UNIX Usage	Windows Usage
Small (Default)	$0.085 per hour	$0.12 per hour
Large	$0.34 per hour	$0.48 per hour
Extra Large	$0.68 per hour	$0.96 per hour
Micro On-Demand Instances	Linux/UNIX Usage	Windows Usage
Micro	$0.02 per hour	$0.03 per hour
High-Memory On-Demand Instances
Extra Large	$0.50 per hour	$0.62 per hour
Double Extra Large	$1.00 per hour	$1.24 per hour
Quadruple Extra Large	$2.00 per hour	$2.48 per hour
High-CPU On-Demand Instances
Medium	$0.17 per hour	$0.29 per hour
Extra Large	$0.68 per hour	$1.16 per hour
Cluster Compute Instances
Quadruple Extra Large	$1.60 per hour	N/A*
`*` Windows is not currently available for Cluster Compute Instances.

Internet Data Transfer

The pricing below is based on data transferred “in” and “out” of Amazon EC2.

Data Transfer In	US & EU Regions	APAC Region
All Data Transfer	Free until Nov 1, 2010 `*`	Free until Nov 1, 2010 `*`

Data Transfer Out `**`	US & EU Regions	APAC Region
First 1 GB per Month	$0.00 per GB	$0.00 per GB
Up to 10 TB per Month	$0.15 per GB	$0.19 per GB

Amazon EBS Volumes- To store data

$0.10 per GB-month of provisioned storage
$0.10 per 1 million I/O requests

Amazon EBS Snapshots to Amazon S3 (priced the same as Amazon S3)

$0.15 per GB-month of data stored
$0.01 per 1,000 PUT requests (when saving a snapshot)
$0.01 per 10,000 GET requests (when loading a snapshot)

http://aws.amazon.com/ec2/#pricing Other costs are optional to your needs

Based on the above- I set out to try and create a how-to DIY for running R (and RevolutionR on 64bit Windows on EC2)

1) Logon to https://console.aws.amazon.com/ec2/home

2) Launch Windows Instance

Choose AMI

Left Margin-AMI-

Top Windows – Select Windows 64 AMI

(note if you select SQL Server it will cost you extra)

Then go through the following steps and launch instance

Selecting EC2 compute depending on number of cores, memory needs and budget

Create a key pair (a .pem file which is basically an encrypted password) and download it.
For tags, etc just click on and pass (or read and create some tags to help you remember, and organize multiple instances)
In configure firewall, remember to Enable Access to RDP (Remote Desktop) and HTTP. You can choose to enable whole internet or your own ip address/es for logging in
Review and launch instance

Go to instance (leftmost margin)

and see status (yellow for pending)
Click on Instance Actions-Connect on Top Bar to see following
Download the .RDP shortcut file and
Click on Instance Actions-Request Admin Password

Wait 15 minutes while burning few cents for free as Microsoft creates a password for you
Have coffee (or tea is you are health minded)
Click Again on Instance Actions-Request Admin Password

Open the key pair file (or .pem file created earlier) using

notepad, and copy and paste the Private Key (looks like gibberish)- and click Decrypt.

Retrieve Password for logging on.

Note the new password generated- this is your Remote Desktop Password.

Click on the .rdp file (or Shortcut file created earlier)- It will connect to your Windows instance.

Enter the new generated password in Remote Desktop

This looks like a new clean machine with just Windows OS installed on it.

Install Chrome (or any other browser) if you do not use Internet Explorer
Install Acrobat Reader (for documentation), Revolution R Enterprise~ 490 mb (it will automatically ask to install the .NET framework-4 files) and /or R

Install packages (I recommend installing R Commander, Rattle and Deducer). Apart from the fact that these GUIs are quite complimentary- they also will install almost all main packages that you need for analysis (as their dependencies) Revolution R installs parallel programming packages by default.

If you want to save your files for working later, you can make a snapshot (go to amazon console-ec2- left margin- ABS -Snapshot- you will see an attached memory (green light)- click on create snapshot to save your files for working later
If you want to use my Windows snapshot you can work on it , just when you start your Amazon Ec2 you can click on snapshots and enter details (see snapshot name below) for making a copy or working on it for exploring either 64 bit R, or multi core cloud computing or just trying out Revolution R’s new packages for academic purposes.

Interview Dean Abbott Abbott Analytics

Here is an interview with noted Analytics Consultant and trainer Dean Abbott. Dean is scheduled to take a workshop on Predictive Analytics at PAW (Predictive Analytics World Conference) Oct 18 , 2010 in Washington D.C

Ajay- Describe your upcoming hands on workshop at Predictive Analytics World and how it can help people learn more predictive modeling.

Refer- http://www.predictiveanalyticsworld.com/dc/2010/handson_predictive_analytics.php

Dean- The hands-on workshop is geared toward individuals who know something about predictive analytics but would like to experience the process. It will help people in two regards. First, by going through the data assessment, preparation, modeling and model assessment stages in one day, the attendees will see how predictive analytics works in reality, including some of the pain associated with false starts and mistakes. At the same time, they will experience success with building reasonable models to solve a problem in a single day. I have found that for many, having to actually build the predictive analytics solution if an eye-opener. Seeing demonstrations show the capabilities of a tool, but greater value for an end-user is the development of intuition of what to do at each each stage of the process that makes the theory of predictive analytics real.

Second, they will gain experience using a top-tier predictive analytics software tool, Enterprise Miner (EM). This is especially helpful for those who are considering purchasing EM, but also for those who have used open source tools and have never experienced the additional power and efficiencies that come with a tool that is well thought out from a business solutions standpoint (as opposed to an algorithm workbench).

Ajay- You are an instructor with software ranging from SPSS, S Plus, SAS Enterprise Miner, Statistica and CART. What features of each software do you like best and are more suited for application in data cases.

Dean- I’ll add Tibco Spotfire Miner, Polyanalyst and Unica’s Predictive Insight to the list of tools I’ve taught “hands-on” courses around, and there are at least a half dozen more I demonstrate in lecture courses (JMP, Matlab, Wizwhy, R, Ggobi, RapidMiner, Orange, Weka, RandomForests and TreeNet to name a few). The development of software is a fascinating undertaking, and each tools has its own strengths and weaknesses.

I personally gravitate toward tools with data flow / icon interface because I think more that way, and I’ve tired of learning more programming languages.

Since the predictive analytics algorithms are roughly the same (backdrop is backdrop no matter which tool you use), the key differentiators are

(1) how data can be loaded in and how tightly integrated can the tool be with the database,

(2) how well big data can be handled,

(3) how extensive are the data manipulation options,

(4) how flexible are the model reporting options, and

(5) how can you get the models and/or predictions out.

There are vast differences in the tools on these matters, so when I recommend tools for customers, I usually interview them quite extensively to understand better how they use data and how the models will be integrated into their business practice.

A final consideration is related to the efficiency of using the tool: how much automation can one introduce so that user-interaction is minimized once the analytics process has been defined. While I don’t like new programming languages, scripting and programming often helps here, though some tools have a way to run the visual programming data diagram itself without converting it to code.

Ajay- What are your views on the increasing trend of consolidation and mergers and acquisitions in the predictive analytics space. Does this increase the need for vendor neutral analysts and consultants as well as conferences.

Dean- When companies buy a predictive analytics software package, it’s a mixed bag. SPSS purchasing of Clementine was ultimately good for the predictive analytics, though it took several years for SPSS to figure out what they wanted to do with it. Darwin ultimately disappeared after being purchased by Oracle, but the newer Oracle data mining tool, ODM, integrates better with the database than Darwin did or even would have been able to.

The biggest trend and pressure for the commercial vendors is the improvements in the Open Source and GNU tools. These are becoming more viable for enterprise-level customers with big data, though from what I’ve seen, they haven’t caught up with the big commercial players yet. There is great value in bringing both commercial and open source tools to the attention of end-users in the context of solutions (rather than sales) in a conference setting, which is I think an advantage that Predictive Analytics World has.

As a vendor-neutral consultant, flux is always a good thing because I have to be proficient in a variety of tools, and it is the breadth that brings value for customers entering into the predictive analytics space. But it is very difficult to keep up with the rapidly-changing market and that is something I am weighing myself: how many tools should I keep in my active toolbox.

Ajay- Describe your career and how you came into the Predictive Analytics space. What are your views on various MS Analytics offered by Universities.

Dean- After getting a masters degree in Applied Mathematics, my first job was at a small aerospace engineering company in Charlottesville, VA called Barron Associates, Inc. (BAI); it is still in existence and doing quite well! I was working on optimal guidance algorithms for some developmental missile systems, and statistical learning was a key part of the process, so I but my teeth on pattern recognition techniques there, and frankly, that was the most interesting part of the job. In fact, most of us agreed that this was the most interesting part: John Elder (Elder Research) was the first employee at BAI, and was there at that time. Gerry Montgomery and Paul Hess were there as well and left to form a data mining company called AbTech and are still in analytics space.

After working at BAI, I had short stints at Martin Marietta Corp. and PAR Government Systems were I worked on analytics solutions in DoD, primarily radar and sonar applications. It was while at Elder Research in the 90s that began working in the commercial space more in financial and risk modeling, and then in 1999 I began working as an independent consultant.

One thing I love about this field is that the same techniques can be applied broadly, and therefore I can work on CRM, web analytics, tax and financial risk, credit scoring, survey analysis, and many more application, and cross-fertilize ideas from one domain into other domains.

Regarding MS degrees, let me first write that I am very encouraged that data mining and predictive analytics are being taught in specific class and programs rather than as just an add-on to an advanced statistics or business class. That stated, I have mixed feelings about analytics offerings at Universities.

I find that most provide a good theoretical foundation in the algorithms, but are weak in describing the entire process in a business context. For those building predictive models, the model-building stage nearly always takes much less time than getting the data ready for modeling and reporting results. These are cross-discipline tasks, requiring some understanding of the database world and the business world for us to define the target variable(s) properly and clean up the data so that the predictive analytics algorithms to work well.

The programs that have a practicum of some kind are the most useful, in my opinion. There are some certificate programs out there that have more of a business-oriented framework, and the NC State program builds an internship into the degree itself. These are positive steps in the field that I’m sure will continue as predictive analytics graduates become more in demand.

Biography-

DEAN ABBOTT is President of Abbott Analytics in San Diego, California. Mr. Abbott has over 21 years of experience applying advanced data mining, data preparation, and data visualization methods in real-world data intensive problems, including fraud detection, response modeling, survey analysis, planned giving, predictive toxicology, signal process, and missile guidance. In addition, he has developed and evaluated algorithms for use in commercial data mining and pattern recognition products, including polynomial networks, neural networks, radial basis functions, and clustering algorithms, and has consulted with data mining software companies to provide critiques and assessments of their current features and future enhancements.

Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught both applied and hands-on data mining courses for major software vendors, including Clementine (SPSS, an IBM Company), Affinium Model (Unica Corporation), Statistica (StatSoft, Inc.), S-Plus and Insightful Miner (Insightful Corporation), Enterprise Miner (SAS), Tibco Spitfire Miner (Tibco), and CART (Salford Systems).

Using JMP 9 and R together

An interesting blog post at http://blogs.sas.com/jmp/index.php?/archives/298-JMP-Into-R!.html on using the new JMP 9 with R, and quite possibly using SAS as well.

Example Code-

Here’s the R integration JSL code used to run the bootstrap

rconn = R Connect();
rconn << Submit(“\[
library(boot)

# Load Boot package
library(boot)

RStatFctn <- function(x,d) {return(mean(x[d]))}

b.basic = matrix(data=NA, nrow=1000, ncol=2)
b.normal = matrix(data=NA, nrow=1000, ncol=2)
b.percent =matrix(data=NA, nrow=1000, ncol=2)
b.bca =matrix(data=NA, nrow=1000, ncol=2)

for(i in 1:1000){
rnormdat = rnorm(30,0,1)
b <- boot(rnormdat, RStatFctn, R = 1000)
b.ci=boot.ci(b, conf =095,type=c(“basic”,”norm”,”perc”,”bca”)) b.basic[i,] = b.ci$basic[,4:5]
b.normal[i,] = b.ci$normal[,2:3]
b.percent[i,] = b.ci$percent[,4:5]
b.bca[i,] = b.ci$bca[,4:5]
}
]\”));
b_basic= rconn << Get(b.basic);
b_normal = rconn << Get(b.normal);
b_percent= rconn << Get(b.percent);
b_bca = rconn << Get(b.bca);
rconn << Disconnect();

Using the R Connect() JSL command and assigning it to the object “rconn”, the code sends messages to the JSL scriptable object “rconn” to submit R code via the Submit() command and to retrieve R matrices containing the bootstrap confidence intervals back via the Get() commands.

and I also found interesting what the write has to say about using JMP (for visual analysis) and SAS (bigger datasets handling) and R (for advanced statistics) together

Other standard JMP tools such as the Data Filter can help to explore these results in ways that cannot easily and quickly be done in R

and

With a little JSL and the statistical and graphics platforms of JMP coupled with the breadth and variety of packages and functions in R, one can build complete easy-to-use applications for statistical analysis.

JMP can also integrate with SAS, which adds the ability to work with large-scale data through the file-based system as well as the depth and advanced capabilities of SAS procedures. With these seamless integrations, JMP can become a hub that enables you to connect with both SAS and R, as well as provide unique statistical features such as the JMP Profiler and interactive graphic features such as Graph Builder

and in the meanwhile here is a data visualization of a frequency analysis of various words bundled together from xkcd.com

Running a R GUI,and parallel programming on Amazon EC2

Ok here is an update to the post on running R on an Amazon EC2.

https://decisionstats.wordpress.com/2010/09/25/running-r-on-amazon-ec2/

1) Login to Amazon Console using instructions in earlier post

2) Select AMI-Platform Ubuntu-i-5575773f

Basically select the latest 64 bit instance from Ubuntu

3) Proceed as in post before to launch AMI and instance- here I chose large with 4cores

3.1) Before connecting to your session

search Synaptic Package Manager for x11-

I installed the X11 VNC server package –

and now interactive sessions will work (read GUIs)

3.2) Modify the line

ssh -i decisionstats2.pem root@ec2-75-101-182-203.compute-1.amazonaws.com

ssh -i decisionstats2.pem -X ubuntu@ec2-75-101-182-203.compute-1.amazonaws.com

This will connect you.

4) INSTALL R – Cran R is a standard Ubuntu Package

using

sudo apt-get install r-base

then type R

and install.packages(“Rcmdr”)

Note – you should be able to see the grey colored Tcl/Tpk script showing cran locations

in a seperate window if X11 is working

5) doSNOW package works on the Ubuntu 64- The results are below for

check <-function(n) {check <-function(n) {

+ for(i in 1:1000)

+ {

+ sme <- matrix(rnorm(100), 10,10)

+ solve(sme)

+ }

> times <- 100

> system.time(x <- foreach(j=1:times ) %dopar% check(j))
user system elapsed
0.150 0.080 7.303
> system.time(for(j in 1:times ) x <- check(j))
user system elapsed
27.460 2.300 29.757

The time of 7.3 is almost 5.5 times faster than running it locally on a dual core, and still 3 times faster than running foreach locally. Note I used 4 cores this time in snow.

5) The Tcl/Tk interface of R Cmdr takes a long time to load on EC2 than locally. It may be due to the fact I was running Ubuntu using a VM Player (http://www.vmware.com/go/downloadplayer/ ). However there seems to be a general slowing down when viewing graphics.

or simply

sudo apt-get install r-cran-rcmdr

Running R on Amazon EC2

On my second day of bludering about high technology, I came across http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/ which describes how to run R on Amazon EC2.

I tried it out and have subsequently added some screenshots to this tutorial so as to help you run R. My intention of course was to run a R GUI preferable Revolution Enterprise on the Amazon EC 2- and crunch uhm a lot of data.

Now go through the steps as follows-

0) Logging onto Amazon Console

http://aws.amazon.com/ec2/

Note you need your Amazon Id (even the same id which you use for buying books).

Note we are into Amazon EC2 as shown by the upper tab. Click upper tab to get into the Amazon EC2

2) Choosing the right AMI-

On the left margin, you can click AMI -Images.

Now you can search for the image-

I chose Ubuntu images (linux images are cheaper) and opendata in the search as belows- I get two images.

You can choose whether you want 32 bit or 64 bit image. Thumb rule- 64 bit images are preferable for data intensive tasks.

Click on launch instance in the upper tab ( near the search feature)

2) A pop up comes up, which shows the 5 step process to launch your computing.

Choose the right compute instance- As the screenshot shows- there are various compute instances and they all are at different multiples of prices or compute units.

After choosing the compute instance of your choice (extra large is highlighted)- click on continue-

3) Instance Details-

I did not choose cloudburst monitoring as it has a extra charge- and I am just trying out things.So I simply clicked continue.

4) Add Tag Details- If you are running a lot of instances you need to create your own tags to help you manage them. Advisable if you are running many instances.

Since I am going to run just one- I clicked continue with adding just two things OS and Stats Package.

5) Create a key pair- A key pair is an added layer of encryption. Click on create new pair and name it (note the name will be handy in coming steps)

After clicking and downloading the key pair- you come into security groups. Security groups is just a set of instructions to help keep your data transfer secure. So I created a new security group.

And I added some ways in security group to connect (like SSH using Port 22)

7) Last step- Review Details and Click Launch

8) On the Left margin click on instances ( you were in Images.>AMI earlier)

It will take some 3-5 minutes to launch an instance. You can see status as pending till then.

9) Pending instance as shown by yellow light-

10) Once the instance is running -it is shown by a green light.

Click on the check box, and on upper tab go to instance actions. Click on connect-

you see a popup with instructions like these-

Open the SSH client of your choice (e.g., PuTTY, terminal).

Locate your private key file, decisionstats2.pem

Use chmod to make sure your key file isn’t publicly viewable, ssh won’t work otherwise:
chmod 400 decisionstats.pem

Connect to your instance using instance’s public DNS [ec2-75-101-182-203.compute-1.amazonaws.com].

Example

Enter the following command line:

ssh -i decisionstats2.pem root@ec2-75-101-182-203.compute-1.amazonaws.com

IMPORTANT-

If you are choosing Ubuntu Terminal to connect- you need to change the word root from above to Ubuntu above.

12) To launch R, just type R at the terminal

If all goes well you should be able to see this-

choose to install any custom packages (like

install.packages(‘doSNOW’)

work on R using command line

13) IMPORTANT- After doing your R work, please CLOSE your instance (

Go to LEFT Margin-Instances-Check the check box of instance you are running- on upper tab- Instance Actions- Click Terminate.

Submitted By:	Eduardo L Leoni
US East AMI ID:	ami-1b9b7c72
AMI Manifest:	PolMethImages/imageR64.manifest.xml
License:	Public
Operating System:	Linux/Uni

Note there are other Amazon Machine Images as well which have R- I found this as well-

Amazon EC2 Ubuntu 8.10 intrepid AMI built by Eric Hammond; Eduardo Leoni added R, many R packages, JAGS, mysql-client and subversion.

Submitted By: Eduardo L Leoni

US East AMI ID: ami-1b9b7c72

AMI Manifest: PolMethImages/imageR64.manifest.xml

License: Public

Operating System: Linux/Uni

2) You can install Revolution R on 32 bit Ubuntu using sudo apt install revolution-r

Various versions of Revolution R are supported on different versions of Linux

see http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

I may be willing to try Red Hat Enterprise Linux on the ec2 with 64 bit AND Revolution Enterprise to see the maximum juice I can get. or you can try it using an image below- It would be interesting if there could be an Amazon Machine Image for that (paid-public and private-academic)

AMI ID: ami-8ba347e2
Name:–
Description:–
Source:redhat-cloud/RHEL-5-Server/5.2/x86_64/Beta-2.6.18-92.1.1/RHEL5.2-Server-x86_64-Beta-2.6.18-92.1.1.manifest.xml
Owner:432018295444	Visibility:Public	Product Code:54DBF944
State:available	Kernel ID:aki-89a347e0	RAM Disk ID:ari-88a347e1
Image Type:machine	Architecture:x86_64	Platform:Red Hat
Root Device Type:instance-store	Root Device:–	Image Size:0 bytes
Block Devices:N/A – Instance Store
Virtualization:paravirtual

3) My ultimate goal is to run a parallel session using all cores on an EC2 instance and a R GUI (like R Commander or Rattle)

4) For sake of running a test- I re did the parallel test I did on my 2 core laptop

but using 68.4 gb Memory and 8 cores (26 Compute Units brrr)

> check <-function(n) {

+ for(i in 1:1000)

+ {

+ sme <- matrix(rnorm(100), 10,10)

+ solve(sme)

+ }

> times <- 100

> system.time(for(j in 1:times ) x <- check(j))

user system elapsed

20.51 0.00 20.66

Note I still got a faster response than using parallel processors (on a 2 core 3 gb memory) BUT the gain was not as much had I tried to use foreach for running each of the 8 cores parallely. Or running multicore package as in http://www.cerebralmastication.com/2010/02/using-the-r-multicore-package-in-linux-with-wild-and-passionate-abandon/

5) For some reason on Ubuntu 64 bit Amazon image -I cant get Revolution R (even after using sudo apt) , and I am still learning how I can try and get Enterprise Edition fired up on a 64 bit Red Hat Enterprise Linux Amazon AMI (and maybe create an all new Machine Image 😉

6) I was also unable to use the -X command desite having x 11 as in http://rgrossman.com/2009/05/17/running-r-on-amazons-ec2/ so was not able to see graphics,

Also it prevents me from loogin onto as root, and asks me to login as ubuntu@amazon…. I also wanted to try logging into using my Windows session but kept shuttling between my VM Player Ubuntu session and Windows

7) Hope this was useful. I am thankful to tips from Revolution Blog, R Grossman’s Blog, Creators of the Open Data AMI, Tal’s R Statistics Blog and Cerebral Mastication blog on this

Parallel Programming using R in Windows

Ashamed at my lack of parallel programming, I decided to learn some R Parallel Programming (after all parallel blogging is not really respect worthy in tech-geek-ninja circles).

So I did the usual Google- CRAN- search like a dog thing only to find some obstacles.

Obstacles-

Some Parallel Programming Packages like doMC are not available in Windows

http://cran.r-project.org/web/packages/doMC/index.html

Some Parallel Programming Packages like doSMP depend on Revolution’s Enterprise R (like –

http://blog.revolutionanalytics.com/2009/07/simple-scalable-parallel-computing-in-r.html

and http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/ (No the latest hack didnt work)

or are in testing like multicore (for Windows) so not available on CRAN

http://cran.r-project.org/web/packages/multicore/index.html

fortunately available on RForge

http://www.rforge.net/multicore/files/

Revolution did make DoSnow AND foreach available on CRAN

see http://blog.revolutionanalytics.com/2009/08/parallel-programming-with-foreach-and-snow.html

but the documentation in SNOW is overwhelming (hint- I use Windows , what does that tell you about my tech acumen)

http://sekhon.berkeley.edu/snow/html/makeCluster.html and

http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html

what is a PVM or MPI? and SOCKS are for wearing or getting lost in washers till I encountered them in SNOW

Finally I did the following-and made the parallel programming work in Windows using R

require(doSNOW)
cl<-makeCluster(2) # I have two cores
registerDoSNOW(cl)
# create a function to run in each itteration of the loop

check <-function(n) {

+ for(i in 1:1000)

+ {

+ sme <- matrix(rnorm(100), 10,10)

+ solve(sme)

+ }

+ }
times <- 100     # times to run the loop
system.time(x <- foreach(j=1:times ) %dopar% check(j))
user  system elapsed
0.16    0.02   19.17
system.time(for(j in 1:times ) x <- check(j))
user  system elapsed</pre>
39.66    0.00   40.46

stopCluster(cl)

And it works!

Please share:

Please share:

Please share:

Please share:

Please share:

Please share:

Please share: