Software HIStory: Bass Institute Part 1

or How SAS Institute needs to take competition from WPS, (sas language compiler) in an alliance with IBM, and from R (open source predictive analytics with tremendous academic support) and financial pressure from Microsoft and SAP more seriously.

On the weekend, I ran into Jeff Bass, owner of BASS Institute. BASS Institute provided a SAS -like compiler in the 1980’s , was very light compared to the then clunky SAS ( which used multiple floppies), and sold many copies. It ran out of money when the shift happened to PCs and SAS Institute managed to reach that first.

Today the shift is happening to cloud computing and though SAS has invested 70 Million in it, it still continues to SUPPORT Microsoft by NOT supporting or even offering financial incentives for customers to use  Ubuntu Linux server and Ubuntu Linux desktop. For academic students it charges 25$ per Windows license, and thus helping sell much more copies of Windows Vista. Why does it not give the Ubuntu Linux version free to students. Why does SAS Institute continue to give the online doc free to people who use it’s language, and undercut it. More importantly why does SAS charge LESS money for excellent software in the BI space. It is one of the best and cheapest BI software and the most expensive desktop software. Why Does the SAS Institute not support Hadoop , Map/Reduce database systems insted of focusing on Oracle, Teradata relationships and feelings ??

Anyways, back to Jeff Bass- This is part 1 of the interview.

Ajay- Jeff, tell us all about the BASS Institute?

Jeff-

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.
 
BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.
 
I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.
 
We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).
 
I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.I no longer work in the software world; I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

the BASS system has been off the market for about 20 years and is an example of old, command line, DOS based software that has been far surpassed by modern products – including SAS for the PC platform.  It was fun providing a “SAS like” language for people on PCs – running MS DOS – but I scrapped the product when PC SAS became a reasonably useable product and PC’s got enough memory and hard disk space.

 

BASS was a SAS “work alike”…it would run many (but certainly not all) SAS programs with few modifications.  It required a DOS PC with 640K of RAM and a hard disk with 1MB of available space.  We used to demo it on a Toshiba laptop with NO hard disk and only a floppy drive.  It was a true compiler that parsed the data / proc step input code and generated 8086 assembly language that went through mild optimization, and then executed.

 

I no longer have the source code…it was saved to an ancient Irwin RS-232 tape drive onto tapes that no longer exist…it is fun how technology has moved on in 20 years!  The BASS system was written in Microsoft Pascal and the code for the compiler was similar to the code that would be generated by the Unix YACC “compiler compiler” when fed the syntax of the SAS data step language.  BASS included the “DATA Step” and the most basic PROCS, like MEANS, FREQ, REG, TTEST, PRINT, SORT and others.  Parts of the system were written in 8086 assembler (I have to smile when I remember that).  If I was to recreate it today, I would probably use YACC and have it produce R source code…but that is an idea I am never likely to spend any time on.

 

We sold quite a few copies of the software and BASS Institute, Incorporated was a going concern until PC SAS became debugged and reliable.  Then there was no point in continuing it.  But I think it would be fun for someone to write a modern open source version of a SAS compiler (the data step and basic procs were developed in the public domain at NC State University before Sall and Goodnight took the company private, so as long as no copyrighted code was used in any way, an open source compiler would probably be legal).

 

I still use SAS (my company has an enterprise license), but only very rarely.  I use R more often and am a big fan of free software (sometimes called open source software, but I like the free software foundation’s distinction at fsf.org).  I appreciated your recommendation of the book “R for SAS and SPSS Users” on your website.  I bought it for my Kindle immediately upon reading about it on your website.

 

I’m a reimbursement and health policy director for the biotech firm Amgen, where I have worked since 1990 or so…  I also serve on the boards of a couple of non-profit organizations in the health care field.

Ajay- Any comments on WPS?

Jeff- I’m glad WPS is out there.  I think alternatives help keep the SAS folks aware that they have to care about competition, at least a little 😉

( Note from Ajay-

You can see more on WPS at http://www.teamwpc.co.uk/home

wps

and on SAS at http://www.sas.com/


R releases new version R 2.9.2

What is new in 2.9.2 (technical details not marketing spit and shine),

what didnt work in 2.9.1 ( shockingly bugs are fixed openly !!)

NEW FEATURES

    o   install.packages(NULL) now lists packages only once even if they
        occur in more than one repository (as the latest compatible
        version of those available will always be downloaded).

    o   approxfun() and approx() now accept a 'rule' of length two, for
        easy specification of different interpolation rules on left and
        right.

        They no longer segfault for invalid zero-length specification
        of 'yleft, 'yright', or 'f'.

    o   seq_along(x) is now equivalent to seq_len(length(x)) even where
        length() has an S3/S4 method; previously it (intentionally)
        always used the default method for length().

    o   PCRE has been updated to version 7.9 (for bug fixes).

    o   agrep() uses 64-bit ints where available on 32-bit platforms
        and so may do a better job with complex matches.
        (E.g. PR#13789, which failed only on 32-bit systems.)

DEPRECATED & DEFUNCT

    o   R CMD Rd2txt is deprecated, and will be removed in 2.10.0.
        (It is just a wrapper for R CMD Rdconv -t txt.)

    o   tools::Rd_parse() is deprecated and will be removed in 2.10.0
        (which will use only Rd version 2).

BUG FIXES

    o   parse_Rd() still did not handle source reference encodings
        properly.

    o   The C utility function PrintValue no longer attempts to print
        attributes for CHARSXPs as those attributes are used
        internally for the CHARSXP cache.  This fixes a segfault when
        calling it on a CHARSXP from C code.

    o   PDF graphics output was producing two instances of anything
        drawn with the symbol font face. (Report from Baptiste Auguie.)

    o   length(x) <- newval and grep() could cause memory corruption.
        (PR#13837)

    o   If model.matrix() was given too large a model, it could crash
        R. (PR#13838, fix found by Olaf Mersmann.)

    o   gzcon() (used by load()) would re-open an open connection,
        leaking a file descriptor each time. (PR#13841)

    o   The checks for inconsistent inheritance reported by setClass()
        now detect inconsistent superclasses and give better warning
        messages.

    o   print.anova() failed to recognize the column labelled
        P(>|Chi|) from a Poisson/binomial GLM anova as a p-value
        column in order to format it appropriately (and as a
        consequence it gave no significance stars).

    o   A missing PROTECT caused rare segfaults during calls to
        load().  (PR#13880, fix found by Bill Dunlap.)

    o   gsub() in a non-UTF-8 locale with a marked UTF-8 input
        could in rare circumstances overrun a buffer and so segfault.

    o   R CMD Rdconv --version was not working correctly.

    o   Missing PROTECTs in nlm() caused "random" errors. (PR#13381 by
        Adam D.I. Kramer, analysis and suggested fix by Bill Dunlap.)

    o   Some extreme cases of pbeta(log.p = TRUE) are more accurate
        (finite values < -700 rather than -Inf).  (PR#13786)

        pbeta() now reports on more cases where the asymptotic
        expansions lose accuracy (the underlying TOMS708 C code was
        ignoring some of these, including the PR#13786 example).

    o   new.env(hash = TRUE, size = NA) now works the way it has been
        documented to for a long time.

    o   tcltk::tk_choose.files(multi = TRUE) produces better-formatted
        output with filenames containing spaces.  (PR#13875)

    o   R CMD check --use-valgrind did not run valgrind on the package
tests.

    o   The tclvalue() and the print() and as.xxx methods for class
        "tclObj" crashed R with an invalid object -- seen with an
        object saved from an earlier session.

    o   R CMD BATCH garbled options -d <debugger> (useful for
        valgrind, although --debugger=valgrind always worked)

    o   INSTALL with LazyData and Encoding declared in DESCRIPTION
        might have left options("encoding") set for the rest of the
        package installation.

And from www.r-project.org the remaining updated news
  • R version 2.9.2 has been released on 2009-08-24. The source code will first become available in this directory, and eventually via all of CRAN. Binaries will arrive in due course (see download instructions above).
  • The first issue of The R Journal is now available
  • The R Foundation as been awarded four slots for R projects in the Google Summer of Code 2009.
  • DSC 2009, The 6th workshop on Directions in Statistical Computing, has been held at the Center for Health and Society, University of Copenhagen, Denmark, July 13-14, 2009.
  • useR! 2009, the R user conference, has been be held at Agrocampus Rennes, France, July 8-10, 2009.
  • useR! 2010, the R user conference, will be held at NIST, Gaithersburg, Maryland, USA, July 21-23, 2010.
  • We have started to collect information about local UseR Groups in the R Wiki.

Citation – http://www.r-project.org

Decisionstats| Miscellaneous Part 5

If you think that adding a seperate category for poetry and humourous articles is too much, well it seems the most popular articles came from this section., The poemon Michael Jackson continues to be all time 1, in terms of number of page views (I had hoped one of the interviews would be number 1), and the breakthrough article on Not using R is even quoted in Australia in a university course on data mining. lol!

1) http://www.decisionstats.com/2009/06/26/tribute-to-michael-jackson/

Poem on MJ. Tribute. May he R.I.P.

2) Top Ten Reasons R language is bad for you. Satire and tongue firmly Continue reading “Decisionstats| Miscellaneous Part 5”

Making Government Transparent Using R

Here is a terrific interview on O’Reilley Radar at http://radar.oreilly.com/2009/07/making-government-transparent.html

It actually talks of using open source statistics like R to make Government more transparent- like analyzing waste.

Some interesting extracts- like I didnt know S is being maintained by SAS.( I thought Tibco had S Plus)

Citation-http://radar.oreilly.com/2009/07/making-government-transparent.html

James Turner: So switching gears, the other thing you’re talking about and a big part of your professional life is the R language. Now I will confess that like Erlang, R is something that is on my radar and I see and I look at it and I say, “Okay. When am I ever going to use it?” I mean Erlang is used some places, but R I guess has a very nichey type of audience, doesn’t it?

Danese Cooper: You know, interestingly enough that’s changing. I think that’s been true. R has been in production or in development, let’s say, for the last 20 years. It is patterned after the S language, which was developed in the ’60s at Bell Labs around the same time that UNIX and C were being developed. And it was S for statistics, right? R is sort of a, “If we had known then what we know now” version of S. They’ve been working on it for 20 years in an academic setting. So it has been very slow to grow. But just in the last couple of years, it’s really gotten to a place where it’s ready for enterprise use. And just this year, the people that maintain S, a company called SAS, S-a-s, in South America, south of this country, have announced that they’re going to have to support R, like it’s that widely used now, particularly in schools.

Danese Cooper works for Revolution COmputing that creates a wonderful and professional version of R called Revolution R – some of the work on parallelization and enabling 64 bit Windows R is great. Danese is also a solid open source credentials person having worked with the Board and also with Apache. O Reilley Media’s work in open source conferences is terrific as well.

That apart, the great stuff is in the rest of this must read interview which is available athttp://radar.oreilly.com/2009/07/making-government-transparent.html

So what happened to S Plus

Splus – The corporate version of S ( the predecessor of R) is still being marketed by Tibco corporation- again rumoured to be an acquisition target of  (???)

  • SAS ( who have desired R like capabilties especially in their IML  product to be released soon
  • SAP who lost out to IBM in the SPSS acquisition
  • Oracle
  • Microsoft
  • Rogue Wave (acquirer of Visual Numerics)
  • etc etc.

Anyways S Plus is still alive and kicking-

“The S language and the S+ application have been critical to our ability to manage big data objects intrinsic to wind analytics and wind energy development,” said Brad Horn, Director of Wind Analytics at NextEra Energy.  “We credit our long-term interface and Spotfire consulting with unlocking new ideas and sources of value.  Joint dialogue on configuration alternatives and our recent efforts to restructure legacy code is allowing us to transition from simple interactive use of S+ to a customized S+ configuration with integrated batch processing, server load balancing, and parallel processing.  S+ has a central role in supporting internal decisions and our group emphasis on scale, speed, and quality.”

http://spotfire.tibco.com/news/press-releases/2009/2_17_2009.aspx

  • Wavelets, Spatial Stats, EnvironmentalStats: Apply statistics for advanced analysis of signal and image data, spatially correlated data, and environmental data.
  • Resampling: Apply resampling techniques, such as bootstrap and permutation tests, to enable the use of standard statistics on smaller data sets.
  • Association Rules: Uncover relationships between variables in large data sets, most commonly to detect purchase patterns (Market Basket Analysis), or in many other areas like web site usage analysis.
  • Recode Values: Easily handle and prepare data from multiple sources by changing the values in a column to a new value.
  • Deployment and Integration:

    • Spotfire Integration: Read and write Spotfire Text Data files, and leverage examples of using Spotfire Professional to visualize, explore and share model results.
    • Custom Java & C++ nodes: Extend Spotfire Miner by writing custom nodes in Java and C++.
    • Remote Script Execution: Execute S+ scripts remotely on S+ Server to offload and distribute intensive jobs.
    • Global Worksheet Parameters: Make workflows more flexible and reusable to interactive and batch applications.
    • FlexBayes: Create more realistic models, provide a natural way to address missing data, and take advantage of prior analysis.

    Data Access and Preparation:

    • New Data File Types: Unlock more data sources by reading new formats including Spotfire Text Data, Microsoft Excel 2007, Microsoft Access 2007, and Matlab 7.
    • JDBC Access: Access new data sources for analysis with data import and export via the sjdbc library in Spotfire S+ 8.1.

    Citation:

    http://spotfire.tibco.com/Products/S-Plus-Overview.aspx

    http://spotfire.tibco.com/Products/Whatsnew-Splus.aspx


    Social Network Analysis: Using R

    Here is a great video and slides on doing statistical network analysis using R. It is by Drew Conway from NYU.

    Social Network Analysis in R from Drew Conway on Vimeo.

    High Performance Computing and R

    From http://cran.r-project.org/web/views/HighPerformanceComputing.html

    The following is an excellent list of High Performance Computing using R.

    CRAN Task View: High Performance and Parallel Computing

    Maintainer: Dirk Eddelbuettel
    Contact: Dirk.Eddelbuettel at R-project.org
    Version: 2009-06-12

    This CRAN task view contains a list of packages, grouped by topic, that are useful for high-performance computing (HPC) with R. In this context, we are defining ‘high-performance computing’ rather loosely as just about anything related to pushing R a littler further: using compiled code, parallel computing (in both explicit and implicit modes), working with large objects as well as profiling.

    Unless otherwise mentioned, all packages presented with hyperlinks are available from CRAN, the Comprehensive R Archive Network.

    Several of the areas discussed in this Task View are undergoing rapid change. Please send suggestions for additions and extensions for this task view to the task view maintainer .

    Suggestions and corrections by Achim Zeileis, Markus Schmidberger, Martin Morgan, Max Kuhn, Tomas Radivoyevitch, Jochen Knaus, Tobias Verbeke, Hao Yu, and David Roseberg are gratefully acknowledged.

    Parallel computing: Explicit parallelism

    • Several packages provide the communications layer required for parallel computing. The first package in this area was rpvm by Li and Rossini which uses the PVM (Parallel Virtual Machine) standard and libraries. rpvm is no longer actively maintained.
    • In recent years, the alternative MPI (Message Passing Interface) standard has become the de facto standard in parallel computing. It is supported in R via the Rmpi by Yu. Rmpi package is mature yet actively maintained and offers access to numerous functions from the MPI API, as well as a number of R-specific extensions. Rmpi can be used with the LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI implementations. It should be noted that LAM/MPI is now in maintenance mode, and new development is focussed on Open MPI.
    • An alternative is provided by the nws (NetWorkSpaces) packages from REvolution Computing. It is the successor to the earlier LindaSpaces approach to parallel computing, and is implemented on top of the Twisted networking toolkit for Python.
    • The snow (Simple Network of Workstations) package by Tierney et al. can use PVM, MPI, NWS as well as direct networking sockets. It provides an abstraction layer by hiding the communications details. The snowFT package provides fault-tolerance extensions to snow.
    • The snowfall package by Knaus provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.
    • The papply package by Currie provided a subset of the Rmpi functionality, but is no longer actively maintained either.
    • The biopara package by Lazar and Schoenfeld offers socket-based parallel execution with some support for load-balancing and fault-tolerance.
    • The taskPR package by Samatova et al. builds on top of LAM/MPI and offers parallel execution of tasks.
    • The Simple Parallel R INTerface (SPRINT) package by Hill et al. ( link , paper ) provides a prototype framework that allows the addition of parallelised functions to R for easy exploitation of HPC systems. Currently only a parallised correlation calculation is provided.

    Parallel computing: Implicit parallelism

    • The pnmath package by Tierney ( link ) uses the Open MP parallel processing directives of recent compilers (such gcc 4.2 or later) for implicit parallelism by replacing a number of internal R functions with replacements that can make use of multiple cores — without any explicit requests from the user. The alternate pnmath0 package offers the same functionality using Pthreads for environments in which the newer compilers are not available. Similar functionality is expected to become integrated into R ‘eventually’.
    • The romp package by Jamitzky was presented at useR! 2008 ( slides ) and offers another interface to Open MP using Fortran. The code is still pre-alpha and available from the Google Code project romp. An R-Forge project romp was initiated but there is no package, yet.
    • The fork package by Warnes provides R-equivalents to low-level Unix system functions like fork, signal, wait, kill and exit in order to spawn sub-processes for parallel execution.
    • The multicore package by Urbanek provides a way of running parallel computations in R on machines with multiple cores or CPUs.
    • The R/parallel package by Vera, Jansen and Suppi offers a C++-based master-slave dispatch mechanism for parallel execution ( link )
    • The RScaLAPACK package by Samatova et al. provides an interface to the ScaLAPACK libraries which can replace the standard BLAS libraries and offer parallel execution of the same BLAS functions.
    • The SPRINT package by Hill adds another parallel framework to R ( link ).
    • The mapReduce package by Brown provides a simple framework for parallel computations following the Google mapReduce approach. It provides a pure R implementation, a syntax following the mapReduce paper and a flexible and parallelizable back end.

    Parallel computing: Grid computing

    • The GridR package by Wegener et al. can be used in a grid computing environment via a web service, via ssh or via Condor or Globus.
    • The multiR package by Grose was presented at useR! 2008 but has not been released. It may offer a snow-style framework on a grid computing platform.
    • The biocep-distrib project by Chine offers a Java-based framework for local, Grid, or Cloud computing. It is under active development.
    • The RHIPE package by Guha profides an interface between R and Hadoop for a Map/Reduce programming framework. ( link )

    Parallel computing: Random numbers

    • Random-number generators for parallel computing are available via the rsprng package by Li, and the rlecuyer package by Sevcikova and Rossini.

    Parallel computing: Resource managers and batch schedulers

    • Job-scheduling toolkits permit management of parallel computing resources and tasks. The slurm (Simple Linux Utility for Resource Management) set of programs (written by a consortium led by Lawrence Livermore Labs) works well with MPI. ( link )
    • The Condor toolkit ( link ) from the University of Wisconsin-Madison has been used with R as described in this R News article .
    • The sfCluster package by Knaus can be used with snowfall. ( link ) but is currently limited to LAM/MPI.
    • The Rsge package by Bode offers an interface to the Sun Grid Engine batch-queuing system.
    • The Rlsf package by Smith et al. offers an interface to the LSF cluster/grid system.

    Parallel computing: Applications

    • The caret package by Kuhn can use can use various frameworks (MPI, NWS etc) to parallelized cross-validation and bootstrap characterizations of predictive models.
    • The multtest package by Pollard et al. can use snow, Rmpi or rpvm for resampling-based testing of multiple hypothesis.
    • The maanova package by Wu can use snow and Rmpi for the analysis of micro-array experiments.
    • The pvclust package by Suzuki and Shimodaira can use snow and Rmpi for hierarchical clustering via multiscale bootstraps; and the scaleboot package by Shimodaira can use pvclust, snow and Rmpi for computing approximately unbiased p-values via multiscale bootstraps.
    • The tm package by Feinerer can use snow and Rmpi for parallelized text mining.
    • The varSelRF package by Diaz-Uriarte can use snow and Rmpi for parallelized use of variable selection via random forests; and the ADaCGH package by Diaz-Uriarte and Rueda can use Rmpi and papply for parallelized analysis of array CGH data.
    • The bcp package by Erdman and Emerson for the bayesian analysis of change points, and the bigmemory package by Kane and Emerson can use nws for parallelized operations.
    • The networksis package by Admiraal and Handcock can use rpvm and snow for parallelized simulation of bipartite graphs via sequential importance smapling.
    • The BARD package by Altman for better automated redistring, the GAMBoost package by Binder for glm and gam model fitting via boosting using b-splines, the Geneland package by Estoup, Guillot and Santos for structure detection from multilocus genetic data, the Matching package by Sekhon for multivariate and propensity score matching, the STAR package by Pouzat for spike train analysis, the bnlearn package by Scutari for bayesian network structure learning, the latentnet package by Krivitsky and Handcock for latent position and cluster models, the lga package by Harrington for linear grouping analysis, the peperr package by Porelius and Binder for parallised estimation of prediction error, the orloca package by Fernandez-Palacin and Munoz-Marquez for operations research locational analysis, the rgenoud package by Mebane and Sekhon for genetic optimization using derivatives the affyPara package by Schmidberger, Vicedo and Mansmann for parallel normalization of Affymetrix microarrays, the puma package by Pearson et al. which propagates uncertainty into standard microarray analyses such as differential expression and the ccems package for combinatorically complex equilibrium model selection all can use snow for parallelized operations using either one of the MPI, PVM, NWS or socket protocols supported by snow.
    • The bugsparallel package uses Rmpi for distributed computing of multiple MCMC chains using WinBUGS.
    • The partDSA package uses nws for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.

    Parallel computing: GPUs

    • The gputools package by Buckner provides several common data-mining algorithms which are implemented using a mixture of nVidia’s CUDA langauge and cublas library. Given a computer with an nVidia GPU these functions may be substantially more efficient than native R routines.

    Large memory and out-of-memory data

    • The biglm package by Lumley uses incremental computations to offers lm() and glm() functionality to data sets stored outside of R’s main memory.
    • The ff package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions.
    • The bigmemory package by Kane and Emerson permits storing large objects such as matrices in memory and uses external pointer objects to refer to them. This permits transparent access from R without bumping against R’s internal memory limits. Several R processes on the same computer can also shared big memory objects.
    • A large number of database packages, and database-alike packages (such as sqldf by Grothendieck and data.table by Dowle) are also of potential interest but not reviewed here.
    • The HadoopStreaming package provides a framework for writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop.

    Easier interfaces for Compiled code

    • The inline package by Sklyar, Murdoch and Smith eases adding code in C, C++ or Fortran to R. It takes care of the compilation, linking and loading of embeded code segments that are stored as R strings.
    • The Rcpp package by Eddelbuettel offers a number of C++ clases that makes transferring R objects to C++ functions (and back) easier, and the RInside package by Eddelbuettel allows easy embedding of R itself into C++ applications for faster and more direct data transfer..
    • The rJava package by Urbanek provides a low-level interface to Java similar to the .Call() interface for C and C++.

    Profiling tools

    • The profr package by Wickham can visualize output from the Rprof interface for profiling.
    • The proftools package by Tierney can also be used to analyse profiling output.

    CRAN packages:

    Related links:

  • Slides from Introduction to High-Performance Computing with R tutorial / workshop presentation