hpc – Page 3 – DECISION STATS

The World of Data as I think

Post discussions on my performance at grad school and WHAT exactly DO I want to work in- I drew the following curves.

Feel free to draw better circles- and I will include your reference here

Caution- Based upon a very ordinary understanding of extra ordinary technical things.

THE WORLD OF DATA

AND WHAT I WANT TO DO IN IT

ps- What do you think? Add a comment

“Build a better mousetrap, and the world will beat a path to your door.”- Emerson

High Performance Computing and R

From http://cran.r-project.org/web/views/HighPerformanceComputing.html

The following is an excellent list of High Performance Computing using R.

CRAN Task View: High Performance and Parallel Computing

Maintainer: Dirk Eddelbuettel

Contact: Dirk.Eddelbuettel at R-project.org

Version: 2009-06-12

This CRAN task view contains a list of packages, grouped by topic, that are useful for high-performance computing (HPC) with R. In this context, we are defining ‘high-performance computing’ rather loosely as just about anything related to pushing R a littler further: using compiled code, parallel computing (in both explicit and implicit modes), working with large objects as well as profiling.

Unless otherwise mentioned, all packages presented with hyperlinks are available from CRAN, the Comprehensive R Archive Network.

Several of the areas discussed in this Task View are undergoing rapid change. Please send suggestions for additions and extensions for this task view to the task view maintainer .

Suggestions and corrections by Achim Zeileis, Markus Schmidberger, Martin Morgan, Max Kuhn, Tomas Radivoyevitch, Jochen Knaus, Tobias Verbeke, Hao Yu, and David Roseberg are gratefully acknowledged.

Parallel computing: Explicit parallelism

Several packages provide the communications layer required for parallel computing. The first package in this area was rpvm by Li and Rossini which uses the PVM (Parallel Virtual Machine) standard and libraries. rpvm is no longer actively maintained.

In recent years, the alternative MPI (Message Passing Interface) standard has become the de facto standard in parallel computing. It is supported in R via the Rmpi by Yu. Rmpi package is mature yet actively maintained and offers access to numerous functions from the MPI API, as well as a number of R-specific extensions. Rmpi can be used with the LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI implementations. It should be noted that LAM/MPI is now in maintenance mode, and new development is focussed on Open MPI.

An alternative is provided by the nws (NetWorkSpaces) packages from REvolution Computing. It is the successor to the earlier LindaSpaces approach to parallel computing, and is implemented on top of the Twisted networking toolkit for Python.

The snow (Simple Network of Workstations) package by Tierney et al. can use PVM, MPI, NWS as well as direct networking sockets. It provides an abstraction layer by hiding the communications details. The snowFT package provides fault-tolerance extensions to snow.

The snowfall package by Knaus provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.

The papply package by Currie provided a subset of the Rmpi functionality, but is no longer actively maintained either.

The biopara package by Lazar and Schoenfeld offers socket-based parallel execution with some support for load-balancing and fault-tolerance.

The taskPR package by Samatova et al. builds on top of LAM/MPI and offers parallel execution of tasks.

The Simple Parallel R INTerface (SPRINT) package by Hill et al. ( link , paper ) provides a prototype framework that allows the addition of parallelised functions to R for easy exploitation of HPC systems. Currently only a parallised correlation calculation is provided.

Parallel computing: Implicit parallelism

The pnmath package by Tierney ( link ) uses the Open MP parallel processing directives of recent compilers (such gcc 4.2 or later) for implicit parallelism by replacing a number of internal R functions with replacements that can make use of multiple cores — without any explicit requests from the user. The alternate pnmath0 package offers the same functionality using Pthreads for environments in which the newer compilers are not available. Similar functionality is expected to become integrated into R ‘eventually’.

The romp package by Jamitzky was presented at useR! 2008 ( slides ) and offers another interface to Open MP using Fortran. The code is still pre-alpha and available from the Google Code project romp. An R-Forge project romp was initiated but there is no package, yet.

The fork package by Warnes provides R-equivalents to low-level Unix system functions like fork, signal, wait, kill and exit in order to spawn sub-processes for parallel execution.

The multicore package by Urbanek provides a way of running parallel computations in R on machines with multiple cores or CPUs.

The R/parallel package by Vera, Jansen and Suppi offers a C++-based master-slave dispatch mechanism for parallel execution ( link )

The RScaLAPACK package by Samatova et al. provides an interface to the ScaLAPACK libraries which can replace the standard BLAS libraries and offer parallel execution of the same BLAS functions.

The SPRINT package by Hill adds another parallel framework to R ( link ).

The mapReduce package by Brown provides a simple framework for parallel computations following the Google mapReduce approach. It provides a pure R implementation, a syntax following the mapReduce paper and a flexible and parallelizable back end.

Parallel computing: Grid computing

The GridR package by Wegener et al. can be used in a grid computing environment via a web service, via ssh or via Condor or Globus.

The multiR package by Grose was presented at useR! 2008 but has not been released. It may offer a snow-style framework on a grid computing platform.

The biocep-distrib project by Chine offers a Java-based framework for local, Grid, or Cloud computing. It is under active development.

The RHIPE package by Guha profides an interface between R and Hadoop for a Map/Reduce programming framework. ( link )

Parallel computing: Random numbers

Random-number generators for parallel computing are available via the rsprng package by Li, and the rlecuyer package by Sevcikova and Rossini.

Parallel computing: Resource managers and batch schedulers

Job-scheduling toolkits permit management of parallel computing resources and tasks. The slurm (Simple Linux Utility for Resource Management) set of programs (written by a consortium led by Lawrence Livermore Labs) works well with MPI. ( link )

The Condor toolkit ( link ) from the University of Wisconsin-Madison has been used with R as described in this R News article .

The sfCluster package by Knaus can be used with snowfall. ( link ) but is currently limited to LAM/MPI.

The Rsge package by Bode offers an interface to the Sun Grid Engine batch-queuing system.

The Rlsf package by Smith et al. offers an interface to the LSF cluster/grid system.

Parallel computing: Applications

The caret package by Kuhn can use can use various frameworks (MPI, NWS etc) to parallelized cross-validation and bootstrap characterizations of predictive models.

The multtest package by Pollard et al. can use snow, Rmpi or rpvm for resampling-based testing of multiple hypothesis.

The maanova package by Wu can use snow and Rmpi for the analysis of micro-array experiments.

The pvclust package by Suzuki and Shimodaira can use snow and Rmpi for hierarchical clustering via multiscale bootstraps; and the scaleboot package by Shimodaira can use pvclust, snow and Rmpi for computing approximately unbiased p-values via multiscale bootstraps.

The tm package by Feinerer can use snow and Rmpi for parallelized text mining.

The varSelRF package by Diaz-Uriarte can use snow and Rmpi for parallelized use of variable selection via random forests; and the ADaCGH package by Diaz-Uriarte and Rueda can use Rmpi and papply for parallelized analysis of array CGH data.

The bcp package by Erdman and Emerson for the bayesian analysis of change points, and the bigmemory package by Kane and Emerson can use nws for parallelized operations.

The networksis package by Admiraal and Handcock can use rpvm and snow for parallelized simulation of bipartite graphs via sequential importance smapling.

The BARD package by Altman for better automated redistring, the GAMBoost package by Binder for glm and gam model fitting via boosting using b-splines, the Geneland package by Estoup, Guillot and Santos for structure detection from multilocus genetic data, the Matching package by Sekhon for multivariate and propensity score matching, the STAR package by Pouzat for spike train analysis, the bnlearn package by Scutari for bayesian network structure learning, the latentnet package by Krivitsky and Handcock for latent position and cluster models, the lga package by Harrington for linear grouping analysis, the peperr package by Porelius and Binder for parallised estimation of prediction error, the orloca package by Fernandez-Palacin and Munoz-Marquez for operations research locational analysis, the rgenoud package by Mebane and Sekhon for genetic optimization using derivatives the affyPara package by Schmidberger, Vicedo and Mansmann for parallel normalization of Affymetrix microarrays, the puma package by Pearson et al. which propagates uncertainty into standard microarray analyses such as differential expression and the ccems package for combinatorically complex equilibrium model selection all can use snow for parallelized operations using either one of the MPI, PVM, NWS or socket protocols supported by snow.

The bugsparallel package uses Rmpi for distributed computing of multiple MCMC chains using WinBUGS.

The partDSA package uses nws for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.

Parallel computing: GPUs

The gputools package by Buckner provides several common data-mining algorithms which are implemented using a mixture of nVidia’s CUDA langauge and cublas library. Given a computer with an nVidia GPU these functions may be substantially more efficient than native R routines.

Large memory and out-of-memory data

The biglm package by Lumley uses incremental computations to offers lm() and glm() functionality to data sets stored outside of R’s main memory.

The ff package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions.

The bigmemory package by Kane and Emerson permits storing large objects such as matrices in memory and uses external pointer objects to refer to them. This permits transparent access from R without bumping against R’s internal memory limits. Several R processes on the same computer can also shared big memory objects.

A large number of database packages, and database-alike packages (such as sqldf by Grothendieck and data.table by Dowle) are also of potential interest but not reviewed here.

The HadoopStreaming package provides a framework for writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop.

Easier interfaces for Compiled code

The inline package by Sklyar, Murdoch and Smith eases adding code in C, C++ or Fortran to R. It takes care of the compilation, linking and loading of embeded code segments that are stored as R strings.

The Rcpp package by Eddelbuettel offers a number of C++ clases that makes transferring R objects to C++ functions (and back) easier, and the RInside package by Eddelbuettel allows easy embedding of R itself into C++ applications for faster and more direct data transfer..

The rJava package by Urbanek provides a low-level interface to Java similar to the .Call() interface for C and C++.

Profiling tools

The profr package by Wickham can visualize output from the Rprof interface for profiling.

The proftools package by Tierney can also be used to analyse profiling output.

CRAN packages:

ADaCGH

BARD

bcp

biglm

bigmemory

biopara

bnlearn

caret

ccems

data.table

ff

fork

GAMBoost

Geneland

gputools

GridR

HadoopStreaming

inline

latentnet

lga

maanova

mapReduce

Matching

multicore

multtest

networksis

nws

orloca

papply

partDSA

peperr

profr

proftools

pvclust

Rcpp

rgenoud

rJava

rlecuyer

Rlsf

Rmpi (core)

rpvm

RScaLAPACK

Rsge

rsprng

scaleboot

snow (core)

snowfall

snowFT

sqldf

STAR

taskPR

tm

varSelRF

Related links:

HPC computing notes by Luke Tierney for HPC class at University of Iowa

Mailing List: R Special Interest Group High Performance Computing

Schmidberger, Morgan, Eddelbuettel, Yu, Tierney and Mansmann (2009) paper on ‘State-of-the-Art in Parallel Computing with R

Luke Tierney’s code directory for pnmath and pnmath0

R-Forge Project: biocep-distrib

R-Forge Project: RInside

Bioconductor Package: affyPara

Bioconductor Package: puma

Google Code Project: romp

Google Code Project: bugsparallel

Slurm project at Lawrence Livermore National Laboratory

Condor project at University of Wisconsin-Madison

Parallel Computing in R with sfCluster/snowfall

Wikipedia: Message Passing Interface (MPI)

Wikipedia: Parallel Virtual Machine (PVM)

Slides from Introduction to High-Performance Computing with R tutorial / workshop presentation

Interview Karim Chine BIOCEP (Cloud Computing with R)

Here is an interview with Karim Chine of http://www.biocep.net/

Working with an R or Scilab on clusters/grids/clouds becomes as simple as working with them locally-

Karim Chine, Biocep.

Ajay- Please describe your career in the field of science. What advice would you give to young science graduates in this recession.

Karim- My original background is in theoretical Physics, I did my Master’s thesis at the Ecole Normale’s Statistical Physics Laboratory where I worked on phase separation in two-dimensional additive mixtures with Dr Werner Krauth. I came to computer science after graduating from the Ecole Polytechnique and I spent two years at TELECOM ParisTech studying software architecture and distributed systems design. I worked then for the IBM Paris Laboratory (VisualAge Pacbase applications’ generator), Schlumberger (Over the Air Platform and Web platform for smartcards personalization services), Air France (SSO deployment) and ILOG (OPL-CPLEX-ODM Development System). This gave me the intense exposure to real world large-scale software design. I crossed the borders of cultural, technical and organizational domains several times and I worked with a broad palette of technologies with some of the best and most innovative engineers. I moved to Cambridge in 2006 and I worked for the European Bioinformatics Institute. It’s where I started dealing with the integration of R into various types of applications. I left the EBI in November 2007. I was looking for an institutional support to help me in bringing into reality a vision that was becoming clearer and clearer about a universal platform for scientific and statistical computing. I failed in getting that support and I have been working on BIOCEP full time for most of the last 18 months without being funded. Few days of consultancy given here and there allowed me to keep going. I spent several weeks at Imperial College, at the National Center for e-Social Sciences and at Berkeley’s department of statistics during that period. Those visits were extremely useful in refining the use cases of my platform. I am still looking for a partner to back the project. You asked me to give advice. The unique advice I would give is to be creative and to try again and again to do what you really want to do. Crisises come and go, they will always do and extreme situations are part of life. I believe hard work and sincerity can prevail anything.

Ajay- Describe BIOCEP’s scope and ambition.

What are the current operational analytics that can be done by users having data.

Karim- My first ambition with BIOCEP is to deliver a universal platform for scientific and statistical computing and to create an open, federative and collaborative environment for the production, sharing and reuse of all the artifacts of computing. My second ambition is to enhance dramatically the accessibility of mathematical and statistical computing, to make HPC a commonplace and to put new analytical, numerical and processing capabilities in the hands of everyone (open science).

The Open source software Conquest has gone very far. Environments like R or Scilab, technologies like Java, Operating Systems like Linux-Ubuntu, and tools like OpenOffice are being used by millions of people. Very little doubt remains about the OSS’s final victory in some domains. The cloud is already a reality and it will take computing to a whole new realm. What is currently missing is the software that, by making the Cloud’s usage seamless, will create new ecosystems and will provide rooms for creativity, innovation and knowledge discovery of an unprecedented scale.

BIOCEP is one more building block into this. BIOCEP is built on top of R and Scilab and anything that you can do within those environments is accessible through BIOCEP. Here is what you have uniquely with this new R/Scilab-based e-platform:

– High productivity via the most advanced cross-platform workbench available for the R environment.

– Advanced Graphics: with BIOCEP, a graphic transducer allows the rendering on client side of graphics produced on server side and enables advanced capabilities like zooming/unzooming/scrolling for R graphics. a client side mouse tracker allows to display dynamically information related to the graphics and depending on coordinates. Several virtual R Devices showing different data can be coupled in zooming/scrolling and this helps comparing visually complex graphics.

– Extensibility with plug-ins: new views (IDE-like views, analytical interfaces…) can be created very easily either programmatically or via drag-and-drop GUI designers.

– Extensibility with server-side extensions: any java code can be packaged and used on server side. The code can interact seamlessly with R and Scilab or provide generic bridges to other software. For example, I provide an extension that allows you to use openoffice as a universal converter between various files formats on server side.

– Seamless High Performance Computing: working with an R or Scilab on clusters/grids/clouds becomes as simple as working with them locally. Distributed computing becomes seamless, creating a large number R and Scilab remote engines and using them to solve large scale problems becomes easier than ever. From the R console the user can create logical links to existing R engines or to newly created ones and use those logical links to pilot the remote workers from within his R session. R functions enable using the logical links to import/export variables from the R session to the different workers and vice versa. R commands/scripts can be executed by the R workers synchronously or asynchronously. Many logical R links can be aggregated into one logical cluster variable that can be used to pilot the R workers in a coordinated way. A cluster.apply function allows the usage of the logical cluster to apply a function to a big data structure by slicing it and sending elementary execution commands to the workers. The workers apply the user’s function to the slices in parallel. The elementary results are aggregated to compose the final result that becomes available within the R session.

– Collaboration: your R/scilab server running in the cloud can be accessed simultaneously by you and your collaborators. Everything gets broadcasted including Graphics. A spreadsheet enables to view and edit data collaboratively. Anyone can write plug-ins to take advantage of the collaborative capabilities of the frameworks. If your IP address is public, you can provide a URL to anyone and get him connect to your locally running R.

– Powerful frameworks for Java developers: BIOCEP provides Frameworks and tools to use R as if it was an Object Oriented Java Toolkit or a Web Toolkit for R-based dynamic application.

– Webservices for C#, Perl, Python users/developers: Most of the capabilities of BIOCEP including piloting of R/Scilab engines on the cloud for distributed computing or for building scalable analytical web application are accessible from most of the programming languages thanks to the SOAP front-end.

– RESTful API: simple URLs can perform computing using R/Scilab engines and return the result as an XML or as graphics in any format. This works like google charts and has all the power of R since the graphic is described with an R script provided as a parameter of the URL. The same API can be exposed on demand by the workbench. This allow for example to integrate a Cloud-R with Excel or OpenOffice. The workbench works as a bridge between the cloud and those applications.

– Advanced Pooling framework for distributed resources: useful for deploying pools of R/scilab engines on multi nodes systems and get them used simultaneously by several distributed client processes in a scalable/optimal way. A supervision GUI is provided for a user friendly management of the pools/nodes/engines.

– Simultaneous use of R and Scilab: Using java scripting, data can be transferred from R to Scilab and vice versa.

Ajay- Could you tell us about a successful BIOCEP installation and what it led to? Can BIOCEP be used by the rest of the R community for other packages? What would be an ideal BIOCEP user /customer for whom cloud based analytics makes more sense ?

Karim- BIOCEP is still in pre-beta stage. However it is a robust and polished pre-Beta that several organizations are already using. Janssen Pharmaceutica is using it to create and deliver statistical applications for drug discovery that use R engines running on their backend servers. The platform is foreseen there as the way to go for an ultimate optimization of some of their data analysis pipelines. Janssen’s head of statistics said to be very much interested in the capabilities given by BIOCEP to statisticians to create their own analytical User Interfaces and deliver them with their models without needing specific software development skills. Shell is creating BIOCEP-based applications prototypes to explore the feasibility and advantages of migrating some of Shell’s applications to the Cloud. One group from Shell Global Solutions is planning to use BIOCEP for running scilab in the cloud for Corrosion simulation modeling. Dr Ivo Dinov’s team at UCLA is studying the migration of some the SOCR applications to the BIOCEP platform as plug-ins and extensions. Dr Ivo Dinov also applied for an important grant for building DISCb (Distributed Infrastructure for Statistical Computing in Biomedicine). If the grant application is successful, BIOCEP will be the backbone at software architecture level of that new infrastructure. In cooperation with the Institute of Biostatistics, Leibniz University of Hannover, Bernd Bischl and Kornelius Rohmeyer have developed a framework to user friendly R-GUIs of different complexity. The toolkit uses BIOCEP as an R-backend since release 2.0. Several small projects have been implemented using this framework and some are in production such as an application for education in biostatistics at the University of Hannover. Also the ESNATS project is planning to use the BIOCEP frameworks. Some development is being done at the EBI to customize the workbench and use it to give to the end user the possibility to run R and Bioconductor on the EBI’s LSF cluster.

I’ve been in touch with Phil Butcher, Sanger’s head of IT and he is considering the deployment of BIOCEP on Sanger’s systems simultaneously with Eucalyptus. The same type of deployment has been discussed with the director of OMII-UK, Neil Chue Hong. BIOCEP’s deployment is probably going to follow the deployment of the Eucalyptus System on NGS. Tena Sakai deployed BIOCEP at the Ernest Gallo Clinic and Research Centre and he is currently exploring the usage of the R on the Cloud via BIOCEP (Eucalyptus / AWS). The platform has been deployed by a small consultancy company specializing in R on several London-based investment banks’ systems. I have had a go ahead form Nancy Wilkins Diher (Director for Science Gateways, SDSC) for deploying on TeraGrid, a deployment on EGEE has been discussed with Dr Steven Newhouse (EGEE Technical Director). Both deployments are in standby at the moment.

Quest Diagnostics is planning to use BIOCEP extensively. Sudeep Talati (University of Manchester) is doing his Master’s project on BIOCEP. He is supervised by Professor Andy Brass and he is exploring the use of a BIOCEP-based infrastructure to deliver microarray analysis workflows in a simple and intuitive way to biologists with and without the Cloud. In Manchester, Robin Pinning (e-Science team leader, Research Computing Services) has the deployment of BIOCEP on Manchester’s research cluster on his agenda…

As I have said, anything that you can do with R including installing, loading and using any R package is accessible through BIOCEP. The platform aims to be universal and to become a tool for productivity and collaboration used by everyone dealing with computing/analytics with or without the cloud.

The Cloud whether it is public or private will be generalized and everyone will become a cloud user in one way or another

Ajay- What motivated you to build BIOCEP and mash cloud computing and R. What scope do you see for cloud computing in developing countries in Asia and Africa?

Karim– When I was at the EBI, I worked on the integration of R within scalable web applications. I explored and tested the available frameworks and tools and all of them were too low level or too simple to answer the problem. I decided to build new frameworks. I had the opportunity to be able to stand on the shoulders of giants.

Simon Urbanek’s packages already bridged the C-API of R with Java reliably. Martin Morgan’s RWebsevices package defined class mappings between R types, including S4 classes, and java.

Progressively R became usable as a Java object oriented toolkit, then as a Java Server. Then I built a pooling framework for distributed resources that made it possible for multiple clients to use multiple R engines optimally.

I started building a GUI to validate the server’s increasingly sophisticated API. That GUI became progressively the workbench.

When I was at Imperial, I worked with the National Grid Service team at the Oxford e-Research Centre to deploy my platform on Oxford’s core cluster. That deployment led to many changes in the architecture to meet all the security requirements.

It was obvious that the next step was to make BIOCEP available on Amazon’s Cloud. Academic Grids are for researchers and the cloud is for everyone. Making the platform work seamlessly on EC2 took few months. With the cloud came the focus on collaborative features (collaborative views, graphics, spreadsheets…).

I can only talk about the example of a country I know, Tunisia, and I guess some of this applies to Asian Countries. Even if the broadband is everywhere today and is becoming accessible and affordable by a majority of Tunisians, I am not sure that the adoption of the cloud would happen soon.

Simple considerations like the obligation to pay for the compute cycles in dollars (and not in dinars) are a barrier for adoption. Spending foreign currencies is subject to several restrictions in general for companies and for individuals; few Tunisians have credit cards that can be used to pay Amazon. Companies would prefer to buy and administer their own machines because the cost of operation and maintenance is lower in Tunisia than it is in Europe/US.

Even if the cloud would help in giving Tunisian researchers access to affordable Computing cycles on demand, it seems that most of them have learned to live without HPC resources and that their research is more theoretical and less computational than it could be. Others are collaborating with research groups in Europe (France) and they are using those European groups’ infrastructures.

Ajay- How would BIOCEP address the problem of data hygiene, data security and privacy. Is encrypted and compressed data transfers supported or planned?

Karim- With BIOCEP, a computational engine is exposed as a distributed component via a single mono-directional HTTP port. When you run such an engine on an EC2 instance you have two options:

1/ totally sandbox the machine (via the security group) and leave only the SSH port open.
Private Key authentication is required to access the machine. In this case you use an SSH Tunnel (created with a tool like Putty for example) which allows you to see the engine as if it was running on your local machine on a port of your choice, the one specified for creating the Tunnel.
When you start the Virtual Workbench and connect in Http mode to your local host via the specified port, you are effectively connecting to the EC2-R engine. 100% of the information exchanged between your workbench and the engine, including your data, is ciphered thanks to the SSH tunnel.
The virtual workbench embeds JSCH and can create the Tunnel for you automatically. This mode doesn’t allow collaboration since it requires the private key to let the workbench talk to the EC2 R/Scilab engine.

2/ tell the EC2 machine at startup (via the “user data”) to require specific credentials from the user. When the machine starts running, the user needs to provide those credentials to get a session ID and to be able to pilot a virtual EC2 R/Scilab engine. This mode enables collaboration. The client (workbench/scripts) connects to the EC2 machine instance via HTTP (will be HTTPS in a near future).

Ajay- Suppose I have 20 gb per month of data and my organization decided to cut back on the number of annual expensive software. How can the current version of BIOCEP help me do the following?

Karim– Ways BIOCEP can help you right now.

1) Data aggregation and Reporting in terms of spreadsheet, presentation and graphs

BIOCEP provides a highly programmable server side spreadsheet.

It can be used interactively as a view of the workbench and simple clicks allow the transfer of data form cells to R variables and vice versa. It can be created and populated from R (console / scripts).

Any R function can be used within dynamically computed cells. The evaluation of those dynamic cells is done on server side and can use high performance computing functions. Macros allow adding reactivity to the spreadsheets.

A macro allows the user to execute any R code in response to a value change of an R variable or of the content of a range within a spreadsheet. Variables docking macros allow the mirroring of R variables of any type (vectors, matrixes, data frames..) with ranges within the spreadsheet in Read/Write mode

. Several ready-to-use User Interface components can be created and docked anywhere within the spreadsheet. Those components include

an R Graphics viewer (PDF viewer) showing Graphics produced by a user-defined R script and reactive on user-defined variables and cell ranges changes,
customizable sliders mirroring R variables,
Buttons executing user-defined R code when pressed,
Combo boxes mirroring factor variables ..

The spreadsheet-based analytical user interface can pilot an R running at any location (local R, Grid R, Cloud R…). It can be created in minutes just by pointing, clicking and copy/pasting.

Cells content+macros+reactive docked components can be saved in a zip file and become a Workbench plug-ins. Like all BIOCEP plug-ins, the spreadsheet-based GUI can be delivered to the end user via a simple URL. It can use a cloud-R or a local R created transparently on the user’s machine.

2) Build time series models, regression models

BIOCEP’s workbench is extensible and I am hoping that contributors will soon start writing plug-ins or converting available GUIs to BIOCEP plug-ins in order to make the creation of those models as easy as possible.

Biography-

Karim Chine
Karim chine graduated from the French Ecole Polytechnique and TELECOM ParisTech. He worked at Ecole Normale Supérieure-LPS (phase separation in two-dimensional additive mixture), IBM (VisualAge Pacbase), Schlumberger (Over the Air Platform and Web platform for smartcards personalization services), Air France (SSO deployment), ILOG (OPL-CPLEX-ODM Development System), European Bioinformatics Institute (Expression Profiler, Biocep) and Imperial College London-Internet Center (Biocep). He contributed to open source software (AdaBroker) and he is the author of the Biocep platform. He currently works on the seamless integration of the new platform within utility computing infrastructures (Amazon EC2), its deployment on Grids (NGS) and its usage as a tool for education and he tries to build collaborations with academic and industrial partners.

You can view his resume here http://www.biocep.net/scan/CV_Karim_Chine_June_2009.pdf

Maintainer:	Dirk Eddelbuettel
Contact:	Dirk.Eddelbuettel at R-project.org
Version:	2009-06-12

Please share:

CRAN Task View: High Performance and Parallel Computing

CRAN packages:

Related links:

Please share:

Working with an R or Scilab on clusters/grids/clouds becomes as simple as working with them locally-

Karim Chine, Biocep.

Please share: