Lyx releases new version- now if only there was a SIMPLE way to put R code in a Lyx existing text class (having tried Sweave and sweaved myself into knots ! 😦
and I hope Ubuntu Linux 10.10 netbook fixes the curious case of disappearing menu bar in Lyx
We are pleased to announce the release of LyX 1.6.9
Beta Release: LyX 2.0.0 beta 4 released.
February 6, 2011
We are pleased to announce the fourth public pre-release of LyX 2.0.0.
Except usual bugfixing we fixed random crashes connected with the new background export and compilation feature.
As far as new features is considered it is now possible
to set the table width,
customize the language package per document,
export LyX files as a single archive containing linked material (e.g. images) directly via export menu.
Since this is most probably the last beta release we also added convertor for old (1.6) preference files which are automatically checked on the startup now.
Some common analytical tasks from the diary of the glamorous life of a business analyst-
1) removing duplicates from a dataset based on certain key values/variables
2) merging two datasets based on a common key/variable/s
3) creating a subset based on a conditional value of a variable
4) creating a subset based on a conditional value of a time-date variable
5) changing format from one date time variable to another
6) doing a means grouped or classified at a level of aggregation
7) creating a new variable based on if then condition
8) creating a macro to run same program with different parameters
9) creating a logistic regression model, scoring dataset,
10) transforming variables
11) checking roc curves of model
12) splitting a dataset for a random sample (repeatable with random seed)
13) creating a cross tab of all variables in a dataset with one response variable
14) creating bins or ranks from a certain variable value
15) graphically examine cross tabs
16) histograms
17) plot(density())
18)creating a pie chart
19) creating a line graph, creating a bar graph
20) creating a bubbles chart
21) running a goal seek kind of simulation/optimization
22) creating a tabular report for multiple metrics grouped for one time/variable
23) creating a basic time series forecast
and some case studies I could think of-
As the Director, Analytics you have to examine current marketing efficiency as well as help optimize sales force efficiency across various channels. In addition you have to examine multiple sales channels including inbound telephone, outgoing direct mail, internet email campaigns. The datawarehouse is an RDBMS but it has multiple data quality issues to be checked for. In addition you need to submit your budget estimates for next year’s annual marketing budget to maximize sales return on investment.
As the Director, Risk you have to examine the overdue mortgages book that your predecessor left you. You need to optimize collections and minimize fraud and write-offs, and your efforts would be measured in maximizing profits from your department.
As a social media consultant you have been asked to maximize social media analytics and social media exposure to your client. You need to create a mechanism to report particular brand keywords, as well as automated triggers between unusual web activity, and statistical analysis of the website analytics metrics. Above all it needs to be set up in an automated reporting dashboard .
As a consultant to a telecommunication company you are asked to monitor churn and review the existing churn models. Also you need to maximize advertising spend on various channels. The problem is there are a large number of promotions always going on, some of the data is either incorrectly coded or there are interaction effects between the various promotions.
As a modeller you need to do the following-
1) Check ROC and H-L curves for existing model
2) Divide dataset in random splits of 40:60
3) Create multiple aggregated variables from the basic variables
4) run regression again and again
5) evaluate statistical robustness and fit of model
6) display results graphically
All these steps can be broken down in little little pieces of code- something which i am putting down a list of.
Are there any common data analysis tasks that you think I am missing out- any common case studies ? let me know.
and as per http://cran.r-project.org/src/base/NEWS
the answer is plenty is new in the newR.
While you and me, were busy writing and reading blogs, or generally writing code for earning more money, or our own research- Uncle Peter D and his band of merry men have been really busy in a much more upgraded R.
————————————–
CHANGES————————-
NEW FEATURES:
• Reading a packages's CITATION file now defaults to ASCII rather
than Latin-1: a package with a non-ASCII CITATION file should
declare an encoding in its DESCRIPTION file and use that encoding
for the CITATION file.
• difftime() now defaults to the "tzone" attribute of "POSIXlt"
objects rather than to the current timezone as set by the default
for the tz argument. (Wish of PR#14182.)
• pretty() is now generic, with new methods for "Date" and "POSIXt"
classes (based on code contributed by Felix Andrews).
• unique() and match() are now faster on character vectors where
all elements are in the global CHARSXP cache and have unmarked
encoding (ASCII). Thanks to Matthew Dowle for suggesting
improvements to the way the hash code is generated in unique.c.
• The enquote() utility, in use internally, is exported now.
• .C() and .Fortran() now map non-zero return values (other than
NA_LOGICAL) for logical vectors to TRUE: it has been an implicit
assumption that they are treated as true.
• The print() methods for "glm" and "lm" objects now insert
linebreaks in long calls in the same way that the print() methods
for "summary.[g]lm" objects have long done. This does change the
layout of the examples for a number of packages, e.g. MASS.
(PR#14250)
• constrOptim() can now be used with method "SANN". (PR#14245)
It gains an argument hessian to be passed to optim(), which
allows all the ... arguments to be intended for f() and grad().
(PR#14071)
• curve() now allows expr to be an object of mode "expression" as
well as "call" and "function".
• The "POSIX[cl]t" methods for Axis() have been replaced by a
single method for "POSIXt".
There are no longer separate plot() methods for "POSIX[cl]t" and
"Date": the default method has been able to handle those classes
for a long time. This _inter alia_ allows a single date-time
object to be supplied, the wish of PR#14016.
The methods had a different default ("") for xlab.
• Classes "POSIXct", "POSIXlt" and "difftime" have generators
.POSIXct(), .POSIXlt() and .difftime(). Package authors are
advised to make use of them (they are available from R 2.11.0) to
proof against planned future changes to the classes.
The ordering of the classes has been changed, so "POSIXt" is now
the second class. See the document ‘Updating packages for
changes in R 2.12.x’ on for
the consequences for a handful of CRAN packages.
• The "POSIXct" method of as.Date() allows a timezone to be
specified (but still defaults to UTC).
• New list2env() utility function as an inverse of
as.list() and for fast multi-assign() to existing
environment. as.environment() is now generic and uses list2env()
as list method.
• There are several small changes to output which ‘zap’ small
numbers, e.g. in printing quantiles of residuals in summaries
from "lm" and "glm" fits, and in test statisics in print.anova().
• Special names such as "dim", "names", etc, are now allowed as
slot names of S4 classes, with "class" the only remaining
exception.
• File .Renviron can have architecture-specific versions such as
.Renviron.i386 on systems with sub-architectures.
• installed.packages() has a new argument subarch to filter on
sub-architecture.
• The summary() method for packageStatus() now has a separate
print() method.
• The default summary() method returns an object inheriting from
class "summaryDefault" which has a separate print() method that
calls zapsmall() for numeric/complex values.
• The startup message now includes the platform and if used,
sub-architecture: this is useful where different
(sub-)architectures run on the same OS.
• The getGraphicsEvent() mechanism now allows multiple windows to
return graphics events, through the new functions
setGraphicsEventHandlers(), setGraphicsEventEnv(), and
getGraphicsEventEnv(). (Currently implemented in the windows()
and X11() devices.)
• tools::texi2dvi() gains an index argument, mainly for use by R
CMD Rd2pdf.
It avoids the use of texindy by texinfo's texi2dvi >= 1.157,
since that does not emulate 'makeindex' well enough to avoid
problems with special characters (such as (, {, !) in indices.
• The ability of readLines() and scan() to re-encode inputs to
marked UTF-8 strings on Windows since R 2.7.0 is extended to
non-UTF-8 locales on other OSes.
• scan() gains a fileEncoding argument to match read.table().
• points() and lines() gain "table" methods to match plot(). (Wish
of PR#10472.)
• Sys.chmod() allows argument mode to be a vector, recycled along
paths.
• There are |, & and xor() methods for classes "octmode" and
"hexmode", which work bitwise.
• Environment variables R_DVIPSCMD, R_LATEXCMD, R_MAKEINDEXCMD,
R_PDFLATEXCMD are no longer used nor set in an R session. (With
the move to tools::texi2dvi(), the conventional environment
variables LATEX, MAKEINDEX and PDFLATEX will be used.
options("dvipscmd") defaults to the value of DVIPS, then to
"dvips".)
• New function isatty() to see if terminal connections are
redirected.
• summaryRprof() returns the sampling interval in component
sample.interval and only returns in by.self data for functions
with non-zero self times.
• print(x) and str(x) now indicate if an empty list x is named.
• install.packages() and remove.packages() with lib unspecified and
multiple libraries in .libPaths() inform the user of the library
location used with a message rather than a warning.
• There is limited support for multiple compressed streams on a
file: all of [bgx]zfile() allow streams to be appended to an
existing file, but bzfile() reads only the first stream.
• Function person() in package utils now uses a given/family scheme
in preference to first/middle/last, is vectorized to handle an
arbitrary number of persons, and gains a role argument to specify
person roles using a controlled vocabulary (the MARC relator
terms).
• Package utils adds a new "bibentry" class for representing and
manipulating bibliographic information in enhanced BibTeX style,
unifying and enhancing the previously existing mechanisms.
• A bibstyle() function has been added to the tools package with
default JSS style for rendering "bibentry" objects, and a
mechanism for registering other rendering styles.
• Several aspects of the display of text help are now customizable
using the new Rd2txt_options() function.
options("help_text_width") is no longer used.
• Added \href tag to the Rd format, to allow hyperlinks to URLs
without displaying the full URL.
• Added \newcommand and \renewcommand tags to the Rd format, to
allow user-defined macros.
• New toRd() generic in the tools package to convert objects to
fragments of Rd code, and added "fragment" argument to Rd2txt(),
Rd2HTML(), and Rd2latex() to support it.
• Directory R_HOME/share/texmf now follows the TDS conventions, so
can be set as a texmf tree (‘root directory’ in MiKTeX parlance).
• S3 generic functions now use correct S4 inheritance when
dispatching on an S4 object. See ?Methods, section on “Methods
for S3 Generic Functions†for recommendations and details.
• format.pval() gains a ... argument to pass arguments such as
nsmall to format(). (Wish of PR#9574)
• legend() supports title.adj. (Wish of PR#13415)
• Added support for subsetting "raster" objects, plus assigning to
a subset, conversion to a matrix (of colour strings), and
comparisons (== and !=).
• Added a new parseLatex() function (and related functions
deparseLatex() and latexToUtf8()) to support conversion of
bibliographic entries for display in R.
• Text rendering of \itemize in help uses a Unicode bullet in UTF-8
and most single-byte Windows locales.
• Added support for polygons with holes to the graphics engine.
This is implemented for the pdf(), postscript(),
x11(type="cairo"), windows(), and quartz() devices (and
associated raster formats), but not for x11(type="Xlib") or
xfig() or pictex(). The user-level interface is the polypath()
function in graphics and grid.path() in grid.
• File NEWS is now generated at installation with a slightly
different format: it will be in UTF-8 on platforms using UTF-8,
and otherwise in ASCII. There is also a PDF version, NEWS.pdf,
installed at the top-level of the R distribution.
• kmeans(x, 1) now works. Further, kmeans now returns between and
total sum of squares.
• arrayInd() and which() gain an argument useNames. For arrayInd,
the default is now false, for speed reasons.
• As is done for closures, the default print method for the formula
class now displays the associated environment if it is not the
global environment.
• A new facility has been added for inserting code into a package
without re-installing it, to facilitate testing changes which can
be selectively added and backed out. See ?insertSource.
• New function readRenviron to (re-)read files in the format of
~/.Renviron and Renviron.site.
• require() will now return FALSE (and not fail) if loading the
package or one of its dependencies fails.
• aperm() now allows argument perm to be a character vector when
the array has named dimnames (as the results of table() calls
do). Similarly, array() allows MARGIN to be a character vector.
(Based on suggestions of Michael Lachmann.)
• Package utils now exports and documents functions
aspell_package_Rd_files() and aspell_package_vignettes() for
spell checking package Rd files and vignettes using Aspell,
Ispell or Hunspell.
• Package news can now be given in Rd format, and news() prefers
these inst/NEWS.Rd files to old-style plain text NEWS or
inst/NEWS files.
• New simple function packageVersion().
• The PCRE library has been updated to version 8.10.
• The standard Unix-alike terminal interface declares its name to
readline as 'R', so that can be used for conditional sections in
~/.inputrc files.
• ‘Writing R Extensions’ now stresses that the standard sections in
.Rd files (other than \alias, \keyword and \note) are intended to
be unique, and the conversion tools now drop duplicates with a
warning.
The .Rd conversion tools also warn about an unrecognized type in
a \docType section.
• ecdf() objects now have a quantile() method.
• format() methods for date-time objects now attempt to make use of
a "tzone" attribute with "%Z" and "%z" formats, but it is not
always possible. (Wish of PR#14358.)
• tools::texi2dvi(file, clean = TRUE) now works in more cases (e.g.
where emulation is used and when file is not in the current
directory).
• New function droplevels() to remove unused factor levels.
• system(command, intern = TRUE) now gives an error on a Unix-alike
(as well as on Windows) if command cannot be run. It reports a
non-success exit status from running command as a warning.
On a Unix-alike an attempt is made to return the actual exit
status of the command in system(intern = FALSE): previously this
had been system-dependent but on POSIX-compliant systems the
value return was 256 times the status.
• system() has a new argument ignore.stdout which can be used to
(portably) ignore standard output.
• system(intern = TRUE) and pipe() connections are guaranteed to be
avaliable on all builds of R.
• Sys.which() has been altered to return "" if the command is not
found (even on Solaris).
• A facility for defining reference-based S4 classes (in the OOP
style of Java, C++, etc.) has been added experimentally to
package methods; see ?ReferenceClasses.
• The predict method for "loess" fits gains an na.action argument
which defaults to na.pass rather than the previous default of
na.omit.
Predictions from "loess" fits are now named from the row names of
newdata.
• Parsing errors detected during Sweave() processing will now be
reported referencing their original location in the source file.
• New adjustcolor() utility, e.g., for simple translucent color
schemes.
• qr() now has a trivial lm method with a simple (fast) validity
check.
• An experimental new programming model has been added to package
methods for reference (OOP-style) classes and methods. See
?ReferenceClasses.
• bzip2 has been updated to version 1.0.6 (bug-fix release).
--with-system-bzlib now requires at least version 1.0.6.
• R now provides jss.cls and jss.bst (the class and bib style file
for the Journal of Statistical Software) as well as RJournal.bib
and Rnews.bib, and R CMD ensures that the .bst and .bib files are
found by BibTeX.
• Functions using the TAR environment variable no longer quote the
value when making system calls. This allows values such as tar
--force-local, but does require additional quotes in, e.g., TAR =
"'/path with spaces/mytar'".
DEPRECATED & DEFUNCT:
• Supplying the parser with a character string containing both
octal/hex and Unicode escapes is now an error.
• File extension .C for C++ code files in packages is now defunct.
• R CMD check no longer supports configuration files containing
Perl configuration variables: use the environment variables
documented in ‘R Internals’ instead.
• The save argument of require() now defaults to FALSE and save =
TRUE is now deprecated. (This facility is very rarely actually
used, and was superseded by the Depends field of the DESCRIPTION
file long ago.)
• R CMD check --no-latex is deprecated in favour of --no-manual.
• R CMD Sd2Rd is formally deprecated and will be removed in R
2.13.0.
PACKAGE INSTALLATION:
• install.packages() has a new argument libs_only to optionally
pass --libs-only to R CMD INSTALL and works analogously for
Windows binary installs (to add support for 64- or 32-bit
Windows).
• When sub-architectures are in use, the installed architectures
are recorded in the Archs field of the DESCRIPTION file. There
is a new default filter, "subarch", in available.packages() to
make use of this.
Code is compiled in a copy of the src directory when a package is
installed for more than one sub-architecture: this avoid problems
with cleaning the sources between building sub-architectures.
• R CMD INSTALL --libs-only no longer overrides the setting of
locking, so a previous version of the package will be restored
unless --no-lock is specified.
UTILITIES:
• R CMD Rprof|build|check are now based on R rather than Perl
scripts. The only remaining Perl scripts are the deprecated R
CMD Sd2Rd and install-info.pl (used only if install-info is not
found) as well as some maintainer-mode-only scripts.
*NB:* because these have been completely rewritten, users should
not expect undocumented details of previous implementations to
have been duplicated.
R CMD no longer manipulates the environment variables PERL5LIB
and PERLLIB.
• R CMD check has a new argument --extra-arch to confine tests to
those needed to check an additional sub-architecture.
Its check for “Subdirectory 'inst' contains no files†is more
thorough: it looks for files, and warns if there are only empty
directories.
Environment variables such as R_LIBS and those used for
customization can be set for the duration of checking _via_ a
file ~/.R/check.Renviron (in the format used by .Renviron, and
with sub-architecture specific versions such as
~/.R/check.Renviron.i386 taking precedence).
There are new options --multiarch to check the package under all
of the installed sub-architectures and --no-multiarch to confine
checking to the sub-architecture under which check is invoked.
If neither option is supplied, a test is done of installed
sub-architectures and all those which can be run on the current
OS are used.
Unless multiple sub-architectures are selected, the install done
by check for testing purposes is only of the current
sub-architecture (_via_ R CMD INSTALL --no-multiarch).
It will skip the check for non-ascii characters in code or data
if the environment variables _R_CHECK_ASCII_CODE_ or
_R_CHECK_ASCII_DATA_ are respectively set to FALSE. (Suggestion
of Vince Carey.)
• R CMD build no longer creates an INDEX file (R CMD INSTALL does
so), and --force removes (rather than overwrites) an existing
INDEX file.
It supports a file ~/.R/build.Renviron analogously to check.
It now runs build-time \Sexpr expressions in help files.
• R CMD Rd2dvi makes use of tools::texi2dvi() to process the
package manual. It is now implemented entirely in R (rather than
partially as a shell script).
• R CMD Rprof now uses utils::summaryRprof() rather than Perl. It
has new arguments to select one of the tables and to limit the
number of entries printed.
• R CMD Sweave now runs R with --vanilla so the environment setting
of R_LIBS will always be used.
C-LEVEL FACILITIES:
• lang5() and lang6() (in addition to pre-existing lang[1-4]())
convenience functions for easier construction of eval() calls.
If you have your own definition, do wrap it inside #ifndef lang5
.... #endif to keep it working with old and new R.
• Header R.h now includes only the C headers it itself needs, hence
no longer includes errno.h. (This helps avoid problems when it
is included from C++ source files.)
• Headers Rinternals.h and R_ext/Print.h include the C++ versions
of stdio.h and stdarg.h respectively if included from a C++
source file.
INSTALLATION:
• A C99 compiler is now required, and more C99 language features
will be used in the R sources.
• Tcl/Tk >= 8.4 is now required (increased from 8.3).
• System functions access, chdir and getcwd are now essential to
configure R. (In practice they have been required for some
time.)
• make check compares the output of the examples from several of
the base packages to reference output rather than the previous
output (if any). Expect some differences due to differences in
floating-point computations between platforms.
• File NEWS is no longer in the sources, but generated as part of
the installation. The primary source for changes is now
doc/NEWS.Rd.
• The popen system call is now required to build R. This ensures
the availability of system(intern = TRUE), pipe() connections and
printing from postscript().
• The pkg-config file libR.pc now also works when R is installed
using a sub-architecture.
• R has always required a BLAS that conforms to IE60559 arithmetic,
but after discovery of more real-world problems caused by a BLAS
that did not, this is tested more thoroughly in this version.
BUG FIXES:
• Calls to selectMethod() by default no longer cache inherited
methods. This could previously corrupt methods used by as().
• The densities of non-central chi-squared are now more accurate in
some cases in the extreme tails, e.g. dchisq(2000, 2, 1000), as a
series expansion was truncated too early. (PR#14105)
• pt() is more accurate in the left tail for ncp large, e.g.
pt(-1000, 3, 200). (PR#14069)
• The default C function (R_binary) for binary ops now sets the S4
bit in the result if either argument is an S4 object. (PR#13209)
• source(echo=TRUE) failed to echo comments that followed the last
statement in a file.
• S4 classes that contained one of "matrix", "array" or "ts" and
also another class now accept superclass objects in new(). Also
fixes failure to call validObject() for these classes.
• Conditional inheritance defined by argument test in
methods::setIs() will no longer be used in S4 method selection
(caching these methods could give incorrect results). See
?setIs.
• The signature of an implicit generic is now used by setGeneric()
when that does not use a definition nor explicitly set a
signature.
• A bug in callNextMethod() for some examples with "..." in the
arguments has been fixed. See file
src/library/methods/tests/nextWithDots.R in the sources.
• match(x, table) (and hence %in%) now treat "POSIXlt" consistently
with, e.g., "POSIXct".
• Built-in code dealing with environments (get(), assign(),
parent.env(), is.environment() and others) now behave
consistently to recognize S4 subclasses; is.name() also
recognizes subclasses.
• The abs.tol control parameter to nlminb() now defaults to 0.0 to
avoid false declarations of convergence in objective functions
that may go negative.
• The standard Unix-alike termination dialog to ask whether to save
the workspace takes a EOF response as n to avoid problems with a
damaged terminal connection. (PR#14332)
• Added warn.unused argument to hist.default() to allow suppression
of spurious warnings about graphical parameters used with
plot=FALSE. (PR#14341)
• predict.lm(), summary.lm(), and indeed lm() itself had issues
with residual DF in zero-weighted cases (the latter two only in
connection with empty models). (Thanks to Bill Dunlap for
spotting the predict() case.)
• aperm() treated resize = NA as resize = TRUE.
• constrOptim() now has an improved convergence criterion, notably
for cases where the minimum was (very close to) zero; further,
other tweaks inspired from code proposals by Ravi Varadhan.
• Rendering of S3 and S4 methods in man pages has been corrected
and made consistent across output formats.
• Simple markup is now allowed in \title sections in .Rd files.
• The behaviour of as.logical() on factors (to use the levels) was
lost in R 2.6.0 and has been restored.
• prompt() did not backquote some default arguments in the \usage
section. (Reported by Claudia Beleites.)
• writeBin() disallows attempts to write 2GB or more in a single
call. (PR#14362)
• new() and getClass() will now work if Class is a subclass of
"classRepresentation" and should also be faster in typical calls.
• The summary() method for data frames makes a better job of names
containing characters invalid in the current locale.
• [[ sub-assignment for factors could create an invalid factor
(reported by Bill Dunlap).
• Negate(f) would not evaluate argument f until first use of
returned function (reported by Olaf Mersmann).
• quietly=FALSE is now also an optional argument of library(), and
consequently, quietly is now propagated also for loading
dependent packages, e.g., in require(*, quietly=TRUE).
• If the loop variable in a for loop was deleted, it would be
recreated as a global variable. (Reported by Radford Neal; the
fix includes his optimizations as well.)
• Task callbacks could report the wrong expression when the task
involved parsing new code. (PR#14368)
• getNamespaceVersion() failed; this was an accidental change in
2.11.0. (PR#14374)
• identical() returned FALSE for external pointer objects even when
the pointer addresses were the same.
• L$a@x[] <- val did not duplicate in a case it should have.
• tempfile() now always gives a random file name (even if the
directory is specified) when called directly after startup and
before the R RNG had been used. (PR#14381)
• quantile(type=6) behaved inconsistently. (PR#14383)
• backSpline(.) behaved incorrectly when the knot sequence was
decreasing. (PR#14386)
• The reference BLAS included in R was assuming that 0*x and x*0
were always zero (whereas they could be NA or NaN in IEC 60559
arithmetic). This was seen in results from tcrossprod, and for
example that log(0) %*% 0 gave 0.
• The calculation of whether text was completely outside the device
region (in which case, you draw nothing) was wrong for screen
devices (which have [0, 0] at top-left). The symptom was (long)
text disappearing when resizing a screen window (to make it
smaller). (PR#14391)
• model.frame(drop.unused.levels = TRUE) did not take into account
NA values of factors when deciding to drop levels. (PR#14393)
• library.dynam.unload required an absolute path for libpath.
(PR#14385)
Both library() and loadNamespace() now record absolute paths for
use by searchpaths() and getNamespaceInfo(ns, "path").
• The self-starting model NLSstClosestX failed if some deviation
was exactly zero. (PR#14384)
• X11(type = "cairo") (and other devices such as png using
cairographics) and which use Pango font selection now work around
a bug in Pango when very small fonts (those with sizes between 0
and 1 in Pango's internal units) are requested. (PR#14369)
• Added workaround for the font problem with X11(type = "cairo")
and similar on Mac OS X whereby italic and bold styles were
interchanged. (PR#13463 amongst many other reports.)
• source(chdir = TRUE) failed to reset the working directory if it
could not be determined - that is now an error.
• Fix for crash of example(rasterImage) on x11(type="Xlib").
• Force Quartz to bring the on-screen display up-to-date
immediately before the snapshot is taken by grid.cap() in the
Cocoa implementation. (PR#14260)
• model.frame had an unstated 500 byte limit on variable names.
(Example reported by Terry Therneau.)
• The 256-byte limit on names is now documented. • Subassignment by [, [[ or $ on an expression object with value
NULL coerced the object to a list.
New software just released from the guys in California (@RevolutionR) so if you are a Linux user and have academic credentials you can download it for free (@Cmastication doesnt), you can test it to see what the big fuss is all about (also see http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php) –
Revolution Analytics has just released Revolution R Enterprise 4.0.1 for Red Hat Enterprise Linux, a significant step forward in enterprise data analytics. Revolution R Enterprise 4.0.1 is built on R 2.11.1, the latest release of the open-source environment for data analysis and graphics. Also available is the initial release of our deployment server solution, RevoDeployR 1.0, designed to help you deliver R analytics via the Web. And coming soon to Linux: RevoScaleR, a new package for fast and efficient multi-core processing of large data sets.
As a registered user of the Academic version of Revolution R Enterprise for Linux, you can take advantage of these improvements by downloading and installing Revolution R Enterprise 4.0.1 today. You can install Revolution R Enterprise 4.0.1 side-by-side with your existing Revolution R Enterprise installations; there is no need to uninstall previous versions.
Download Information
The following information is all you will need to download and install the Academic Edition.
Supported Platforms:
Revolution R Enterprise Academic edition and RevoDeployR are supported on Red Hat® Enterprise Linux® 5.4 or greater (64-bit processors).
Approximately 300MB free disk space is required for a full install of Revolution R Enterprise. We recommend at least 1GB of RAM to use Revolution R Enterprise.
For the full list of system requirements for RevoDeployR, refer to the RevoDeployR™ Installation Guide for Red Hat® Enterprise Linux®.
Download Links:
You will first need to download the Revolution R Enterprise installer.
Installation Instructions for Revolution R Enterprise Academic Edition
After downloading the installer, do the following to install the software:
Log in as root if you have not already.
Change directory to the directory containing the downloaded installer.
Unpack the installer using the following command:
tar -xzf Revo-Ent-4.0.1-RHEL5-desktop.tar.gz
Change directory to the RevolutionR_4.0.1 directory created.
Run the installer by typing ./install.py and following the on-screen prompts.
Getting Started with the Revolution R Enterprise
After you have installed the software, launch Revolution R Enterprise by typing Revo64 at the shell prompt.
Documentation is available in the form of PDF documents installed as part of the Revolution R Enterprise distribution. Type Revo.home(“doc”) at the R prompt to locate the directory containing the manuals Getting Started with Revolution R (RevoMan.pdf) and the ParallelR User’s Guide(parRman.pdf).
Installation Instructions for RevoDeployR (and RServe)
After downloading the RevoDeployR distribution, use the following steps to install the software:
Note: These instructions are for an automatic install. For more details or for manual install instructions, refer to RevoDeployR_Installation_Instructions_for_RedHat.pdf.
Log into the operating system as root.
su –
Change directory to the directory containing the downloaded distribution for RevoDeployR and RServe.
Unzip the contents of the RevoDeployR tar file. At prompt, type:
tar -xzf deployrRedHat.tar.gz
Change directories. At the prompt, type:
cd installFiles
Launch the automated installation script and follow the on-screen prompts. At the prompt, type:
./installRedHat.sh Note:Red Hat installs MySQL without a password.
Getting Started with RevoDeployR
After installing RevoDeployR, you will be directed to the RevoDeployR landing page. The landing page has links to documentation, the RevoDeployR management console, the API Explorer development tool, and sample code.
The simple R-benchmark-25.R test script is a quick-running survey of general R performance. The Community-developed test consists of three sets of small benchmarks, referred to in the script as Matrix Calculation, Matrix Functions, and Program Control.
Revolution Analytics has created its own tests to simulate common real-world computations. Their descriptions are explained below.
Linear Algebra Computation
Base R 2.9.2
Revolution R (1-core)
Revolution R (4-core)
Speedup (4 core)
Matrix Multiply
243 sec
22 sec
5.9 sec
41x
Cholesky Factorization
23 sec
3.8 sec
1.1 sec
21x
Singular Value Decomposition
62 sec
13 sec
4.9 sec
12.6x
Principal Components Analysis
237 sec
41 sec
15.6 sec
15.2x
Linear Discriminant Analysis
142 sec
49 sec
32.0 sec
4.4x
Speedup = Slower time / Faster Time – 1
Matrix Multiply
This routine creates a random uniform 10,000 x 5,000 matrix A, and then times the computation of the matrix product transpose(A) * A.
set.seed (1)
m <- 10000
n <- 5000
A <- matrix (runif (m*n),m,n)
system.time (B <- crossprod(A))
The system will respond with a message in this format:
User system elapsed
37.22 0.40 9.68
The “elapsed” times indicate total wall-clock time to run the timed code.
The table above reflects the elapsed time for this and the other benchmark tests. The test system was an INTEL® Xeon® 8-core CPU (model X55600) at 2.5 GHz with 18 GB system RAM running Windows Server 2008 operating system. For the Revolution R benchmarks, the computations were limited to 1 core and 4 cores by calling setMKLthreads(1) and setMKLthreads(4) respectively. Note that Revolution R performs very well even in single-threaded tests: this is a result of the optimized algorithms in the Intel MKL library linked to Revolution R. The slight greater than linear speedup may be due to the greater total cache available to all CPU cores, or simply better OS CPU scheduling–no attempt was made to pin execution threads to physical cores. Consult Revolution R’s documentation to learn how to run benchmarks that use less cores than your hardware offers.
Cholesky Factorization
The Cholesky matrix factorization may be used to compute the solution of linear systems of equations with a symmetric positive definite coefficient matrix, to compute correlated sets of pseudo-random numbers, and other tasks. We re-use the matrix B computed in the example above:
system.time (C <- chol(B))
Singular Value Decomposition with Applications
The Singular Value Decomposition (SVD) is a numerically-stable and very useful matrix decompisition. The SVD is often used to compute Principal Components and Linear Discriminant Analysis.
# Singular Value Deomposition
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (S <- svd (A,nu=0,nv=0))
# Principal Components Analysis
m <- 10000
n <- 2000
A <- matrix (runif (m*n),m,n)
system.time (P <- prcomp(A))
# Linear Discriminant Analysis require (‘MASS’)
g <- 5
k <- round (m/2)
A <- data.frame (A, fac=sample (LETTERS[1:g],m,replace=TRUE))
train <- sample(1:m, k)
system.time (L <- lda(fac ~., data=A, prior=rep(1,g)/g, subset=train))