Ashamed at my lack of parallel programming, I decided to learn some R Parallel Programming (after all parallel blogging is not really respect worthy in tech-geek-ninja circles).
So I did the usual Google- CRAN- search like a dog thing only to find some obstacles.
Obstacles-
Some Parallel Programming Packages like doMC are not available in Windows
http://cran.r-project.org/web/packages/doMC/index.html
Some Parallel Programming Packages like doSMP depend on Revolution’s Enterprise R (like –
http://blog.revolutionanalytics.com/2009/07/simple-scalable-parallel-computing-in-r.html
and http://www.r-statistics.com/2010/04/parallel-multicore-processing-with-r-on-windows/ (No the latest hack didnt work)
or are in testing like multicore (for Windows) so not available on CRAN
http://cran.r-project.org/web/packages/multicore/index.html
fortunately available on RForge
http://www.rforge.net/multicore/files/
Revolution did make DoSnow AND foreach available on CRAN
see http://blog.revolutionanalytics.com/2009/08/parallel-programming-with-foreach-and-snow.html
but the documentation in SNOW is overwhelming (hint- I use Windows , what does that tell you about my tech acumen)
http://sekhon.berkeley.edu/snow/html/makeCluster.html and
http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html
what is a PVM or MPI? and SOCKS are for wearing or getting lost in washers till I encountered them in SNOW
Finally I did the following-and made the parallel programming work in Windows using R
require(doSNOW) cl<-makeCluster(2) # I have two cores registerDoSNOW(cl) # create a function to run in each itteration of the loop check <-function(n) { + for(i in 1:1000) + { + sme <- matrix(rnorm(100), 10,10) + solve(sme) + } + } times <- 100 # times to run the loop system.time(x <- foreach(j=1:times ) %dopar% check(j)) user system elapsed 0.16 0.02 19.17 system.time(for(j in 1:times ) x <- check(j)) user system elapsed</pre> 39.66 0.00 40.46 stopCluster(cl)
And it works!
that time i’m quite sure i pasted it, maybe i’ve hit on a wordpress bug..
system.time(x <- foreach(j=1:times ) %dopar% check(j))
user system elapsed
0.10 0.01 22.01
========
system.time(for(j in 1:times ) x <- check(j))
user system elapsed
21.98 0.02 22.03
========
It is interesting to see 64 bit OS’s effects on processing time as there is hardly any improvement. I redid the same using Amazon ec2 environment also.
In fact in my next blog post I am using the same example but on Amazon large instance (dual core 64 bit 7.5 gb RAM) and I find it is still faster than doing the foreach loop locally on a 32 bit 3gb RAM machine.
if it were a four or eight core machine, would it make a difference? or does the 64bit windows r build just completely squelch out the possibility of any time improvement from parallel processing?
thanks!
I think it does. I need to test it out though
i’m sorry, i somehow didn’t paste the entire syntax
> system.time(x system.time(for(j in 1:times ) x
i have two cores also, but when i run it, i get
> system.time(x system.time(for(j in 1:times ) x <- check(j))
user system elapsed
21.89 0.01 22.00
i am running it in windows r 2.11.1 x64..
and what do you get when you use %dopar%
Hi Ajay,
I didn’t know about “doSNOW”, that’s cool to know.
I’ve been able to make doSMP work fine (R 2.11.1 with win-7, and also on 2.10 with win XP), but I am glad there are more solutions out there for windows.
Cheers,
Tal
I am using Windows XP SP2 R 2.11.1 – I did the same, copy folders from R Enterprise, and downloaded the RI.. from the site, still it didnt recognize doSMP as a package. Yes doSNOW is a cool package and REVO is to be thanked