Home » Posts tagged 'Gains'
Tag Archives: Gains
Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats
Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.
Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages
Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (http://wiki.tcl.tk/17068) …not forgetting “SQLScript” which is our own version of SQL (http://goo.gl/x3bwh) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.
Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.
Blag- I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.
Ajay- Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.
Blag- I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.
Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.
Blag- Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html)
Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc
Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.
About-
http://scn.sap.com/people/alvaro.tejadagalindo3
| Name: | Alvaro Tejada Galindo |
| Email: | a.tejada.galindo@sap.com |
| Profession: | Development |
| Company: | SAP Canada Labs-Montreal |
| Town/City: | Montreal |
| Country: | Canada |
| Instant Messaging Type: | |
| Instant Messaging ID: | Blag |
| Personal URL: | http://blagrants.blogspot.com |
| Professional Blog URL: | http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910 |
| My Relation to SAP: | employee |
| Short Bio: | Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997. |
http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx
and from
http://en.wikipedia.org/wiki/SAP_HANA
SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]
- SAP HANA DB (or HANA DB) refers to the database technology itself,
- SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
- SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
- SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript
More blog posts on using SAP and R together
Dealing with R and HANAhttp://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana
HANA meets R
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss
http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss
Using RODBC with SAP HANA DB-
SAP HANA: My experiences on using SAP HANA with R
and of course the blog that started it all-
Jitender Aswani’s http://allthingsr.blogspot.in/
Sanskrit for Human Resource Management
So I picked up more Sanskrit on my stay at Goa at the Tantra http://www.decisionstats.com/tantra-anjuna/-
Things to do- or Aims of Human Life
Dharam- Planning, Duty and Responsibilities
Karam- Executing Actions
Artha-Monetary Gains through Planning and Executing
Kama-Desires and Pleasure Seeking
Moksha- Achieving Self Actualization
Things to Control-
http://en.wikipedia.org/wiki/Five_Evils
instead of 7 sins in Western thought, there are only 5 evils in Sanksrit. Also these evils are correlated, if you control one too much, the other evils will rise.
Kam – Your Lusts or Desires
Krodha-Your Anger
Madh-Your Pride
Lobh-Your Greed for Monetary Satisfaction
Moh-Your affection and love and attachments
Also related-
Sanskrit for Motivation
http://www.decisionstats.com/strategic-tactics-in-sanskrit/
Indian Societal Hierarchy
http://www.decisionstats.com/economic-indian-caste-system-simplification/
Analytics for Cyber Conflict
The emerging use of Analytics and Knowledge Discovery in Databases for Cyber Conflict and Trade Negotiations
The blog post is the first in series or articles on cyber conflict and the use of analytics for targeting in both offense and defense in conflict situations.
It covers knowledge discovery in four kinds of databases (so chosen because of perceived importance , sensitivity, criticality and functioning of the geopolitical economic system)-
- Databases on Unique Identity Identifiers- including next generation biometric databases connected to Government Initiatives and Banking, and current generation databases of identifiers like government issued documents made online
- Databases on financial details -This includes not only traditional financial service providers but also online databases with payment details collected by retail product selling corporates like Sony’s Playstation Network, Microsoft ‘s XBox and
- Databases on contact details – including those by offline businesses collecting marketing databases and contact details
- Databases on social behavior- primarily collected by online businesses like Facebook , and other social media platforms.
It examines the role of
-
voluntary privacy safeguards and government regulations ,
-
weak cryptographic security of databases,
-
weakness in balancing marketing ( maximized data ) with privacy (minimized data)
-
and lastly the role of ownership patterns in database owning corporates
A small distinction between cyber crime and cyber conflict is that while cyber crime focusses on stealing data, intellectual property and information to primarily maximize economic gains
cyber conflict focuses on stealing information and also disrupt effective working of database backed systems in order to gain notional competitive advantages in economics as well as geo-politics. Cyber terrorism is basically cyber conflict by non-state agents or by designated terrorist states as defined by the regulations of the “target” entity. A cyber attack is an offensive action related to cyber-infrastructure (like the Stuxnet worm that disabled uranium enrichment centrifuges of Iran). Cyber attacks and cyber terrorism are out of scope of this paper, we will concentrate on cyber conflicts involving databases.
Some examples are given here-
Types of Knowledge Discovery in -
1) Databases on Unique Identifiers- including biometric databases.
Unique Identifiers or primary keys for identifying people are critical for any intensive knowledge discovery program. The unique identifier generated must be extremely secure , and not liable to reverse engineering of the cryptographic hash function.
For biometric databases, an interesting possibility could be determining the ethnic identity from biometric information, and also mapping relatives. Current biometric information that is collected is- fingerprint data, eyes iris data, facial data. A further feature could be adding in voice data as a part of biometric databases.
This is subject to obvious privacy safeguards.
For example, Google recently unveiled facial recognition to unlock Android 4.0 mobiles, only to find out that the security feature could easily be bypassed by using a photo of the owner.
Example of Biometric Databases
In Afghanistan more than 2 million Afghans have contributed iris, fingerprint, facial data to a biometric database. In India, 121 million people have already been enrolled in the largest biometric database in the world. More than half a million customers of the Tokyo Mitsubishi Bank are are already using biometric verification at ATMs.
Examples of Breached Online Databases
In 2011, Playstation Network by Sony (PSN) lost data of 77 million customers including personal information and credit card information. Additionally data of 24 million customers were lost by Sony’s Sony Online Entertainment. The websites of open source platforms like SourceForge, WineHQ and Kernel.org were also broken into 2011. Even retailers like McDonald and Walgreen reported database breaches.
The role of cyber conflict arises in the following cases-
-
Databases are online for accessing and authentication by proper users. Databases can be breached remotely by non-owners ( or “perpetrators”) non with much lesser chance of intruder identification, detection and penalization by regulators, or law enforcers (or “protectors”) than offline modes of intellectual property theft.
-
Databases are valuable to external agents (or “sponsors”) subsidizing ( with finance, technology, information, motivation) the perpetrators for intellectual property theft. Databases contain information that can be used to disrupt the functioning of a particular economy, corporation (or “ primary targets”) or for further chain or domino effects in accessing other data (or “secondary targets”)
-
Loss of data is more expensive than enhanced cost of security to database owners
-
Loss of data is more disruptive to people whose data is contained within the database (or “customers”)
So the role play for different people for these kind of databases consists of-
1) Customers- who are in the database
2) Owners -who own the database. They together form the primary and secondary targets.
3) Protectors- who help customers and owners secure the databases.
and
1) Sponsors- who benefit from the theft or disruption of the database
2) Perpetrators- who execute the actual theft and disruption in the database
The use of topic models and LDA is known for making data reduction on text, and the use of data visualization including tied to GPS based location data is well known for investigative purposes, but the increasing complexity of both data generation and the sophistication of machine learning driven data processing makes this an interesting area to watch.
The next article in this series will cover-
the kind of algorithms that are currently or being proposed for cyber conflict, the role of non state agents , and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.
Citations-
- Michael A. Vatis , CYBER ATTACKS DURING THE WAR ON TERRORISM: A PREDICTIVE ANALYSIS Dartmouth College (Institute for Security Technology Studies).
- From Data Mining to Knowledge Discovery in Databases Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyt
Interview JJ Allaire Founder, RStudio
Here is an interview with JJ Allaire, founder of RStudio. RStudio is the IDE that has overtaken other IDE within the R Community in terms of ease of usage. On the eve of their latest product launch, JJ talks to DecisionStats on RStudio and more.
Ajay- So what is new in the latest version of RStudio and how exactly is it useful for people?
JJ- The initial release of RStudio as well as the two follow-up releases we did last year were focused on the core elements of using R: editing and running code, getting help, and managing files, history, workspaces, plots, and packages. In the meantime users have also been asking for some bigger features that would improve the overall work-flow of doing analysis with R. In this release (v0.95) we focused on three of these features:
Projects. R developers tend to have several (and often dozens) of working contexts associated with different clients, analyses, data sets, etc. RStudio projects make it easy to keep these contexts well separated (with distinct R sessions, working directories, environments, command histories, and active source documents), switch quickly between project contexts, and even work with multiple projects at once (using multiple running versions of RStudio).
Version Control. The benefits of using version control for collaboration are well known, but we also believe that solo data analysis can achieve significant productivity gains by using version control (this discussion on Stack Overflow talks about why). In this release we introduced integrated support for the two most popular open-source version control systems: Git and Subversion. This includes changelist management, file diffing, and browsing of project history, all right from within RStudio.
Code Navigation. When you look at how programmers work a surprisingly large amount of time is spent simply navigating from one context to another. Modern programming environments for general purpose languages like C++ and Java solve this problem using various forms of code navigation, and in this release we’ve brought these capabilities to R. The two main features here are the ability to type the name of any file or function in your project and go immediately to it; and the ability to navigate to the definition of any function under your cursor (including the definition of functions within packages) using a keystroke (F2) or mouse gesture (Ctrl+Click).
Ajay- What’s the product road map for RStudio? When can we expect the IDE to turn into a full fledged GUI?
JJ- Linus Torvalds has said that “Linux is evolution, not intelligent design.” RStudio tries to operate on a similar principle—the world of statistical computing is too deep, diverse, and ever-changing for any one person or vendor to map out in advance what is most important. So, our internal process is to ship a new release every few months, listen to what people are doing with the product (and hope to do with it), and then start from scratch again making the improvements that are considered most important.
Right now some of the things which seem to be top of mind for users are improved support for authoring and reproducible research, various editor enhancements including code folding, and debugging tools.
What you’ll see is us do in a given release is to work on a combination of frequently requested features, smaller improvements to usability and work-flow, bug fixes, and finally architectural changes required to support current or future feature requirements.
While we do try to base what we work on as closely as possible on direct user-feedback, we also adhere to some core principles concerning the overall philosophy and direction of the product. So for example the answer to the question about the IDE turning into a full-fledged GUI is: never. We believe that textual representations of computations provide fundamental advantages in transparency, reproducibility, collaboration, and re-usability. We believe that writing code is simply the right way to do complex technical work, so we’ll always look for ways to make coding better, faster, and easier rather than try to eliminate coding altogether.
Ajay -Describe your journey in science from a high school student to your present work in R. I noticed you have been very successful in making software products that have been mostly proprietary products or sold to companies.
Why did you get into open source products with RStudio? What are your plans for monetizing RStudio further down the line?
JJ- In high school and college my principal areas of study were Political Science and Economics. I also had a very strong parallel interest in both computing and quantitative analysis. My first job out of college was as a financial analyst at a government agency. The tools I used in that job were SAS and Excel. I had a dim notion that there must be a better way to marry computation and data analysis than those tools, but of course no concept of what this would look like.
From there I went more in the direction of general purpose computing, starting a couple of companies where I worked principally on programming languages and authoring tools for the Web. These companies produced proprietary software, which at the time (between 1995 and 2005) was a workable model because it allowed us to build the revenue required to fund development and to promote and distribute the software to a wider audience.
By 2005 it was however becoming clear that proprietary software would ultimately be overtaken by open source software in nearly all domains. The cost of development had shrunken dramatically thanks to both the availability of high-quality open source languages and tools as well as the scale of global collaboration possible on open source projects. The cost of promoting and distributing software had also collapsed thanks to efficiency of both distribution and information diffusion on the Web.
When I heard about R and learned more about it, I become very excited and inspired by what the project had accomplished. A group of extremely talented and dedicated users had created the software they needed for their work and then shared the fruits of that work with everyone. R was a platform that everyone could rally around because it worked so well, was extensible in all the right ways, and most importantly was free (as in speech) so users could depend upon it as a long-term foundation for their work.
So I started RStudio with the aim of making useful contributions to the R community. We started with building an IDE because it seemed like a first-rate development environment for R that was both powerful and easy to use was an unmet need. Being aware that many other companies had built successful businesses around open-source software, we were also convinced that we could make RStudio available under a free and open-source license (the AGPLv3) while still creating a viable business. At this point RStudio is exclusively focused on creating the best IDE for R that we can. As the core product gets where it needs to be over the next couple of years we’ll then also begin to sell other products and services related to R and RStudio.
About-

JJ Allaire
JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion,Windows Live Writer, Lose It!, and RStudio.
From http://en.wikipedia.org/wiki/Joseph_J._Allaire
In 1995 Joseph J. (JJ) Allaire co-founded Allaire Corporation with his brother Jeremy Allaire, creating the web development tool ColdFusion.[1] In March 2001, Allaire was sold to Macromedia where ColdFusion was integrated into the Macromedia MX product line. Macromedia was subsequently acquired by Adobe Systems, which continues to develop and market ColdFusion.
After the sale of his company, Allaire became frustrated at the difficulty of keeping track of research he was doing using Google. To address this problem, he co-founded Onfolio in 2004 with Adam Berrey, former Allaire co-founder and VP of Marketing at Macromedia.
On March 8, 2006, Onfolio was acquired by Microsoft where many of the features of the original product are being incorporated into the Windows Live Toolbar. On August 13, 2006, Microsoft released the public beta of a new desktop blogging client called Windows Live Writer that was created by Allaire’s team at Microsoft.
Starting in 2009, Allaire has been developing a web-based interface to the widely used R technical computing environment. A beta version of RStudio was publicly released on February 28, 2011.
JJ Allaire received his B.A. from Macalester College (St. Paul, MN) in 1991.
RStudio-
RStudio is an integrated development environment (IDE) for R which works with the standard version of R available from CRAN. Like R, RStudio is available under a free software license. RStudio is designed to be as straightforward and intuitive as possible to provide a friendly environment for new and experienced R users alike. RStudio is also a company, and they plan to sell services (support, training, consulting, hosting) related to the open-source software they distribute.
SAS Institute Financials 2011
SAS Institute has release it’s financials for 2011 at http://www.sas.com/news/preleases/2011financials.html,
Revenue surged across all solution and industry categories. Software to detect fraud saw a triple-digit jump. Revenue from on-demand solutions grew almost 50 percent. Growth from analytics and information management solutions were double digit, as were gains from customer intelligence, retail, risk and supply chain solutions
AJAY- and as a private company it is quite nice that they are willing to share so much information every year.
The graphics are nice ( and the colors much better than in 2010) , but pie-charts- seriously dude there is no way to compare how much SAS revenue is shifting across geographies or even across industries. So my two cents is – lose the pie charts, and stick to line graphs please for the share of revenue by country /industry.
In 2011, SAS grew staff 9.2 percent and reinvested 24 percent of revenue into research and development
AJAY- So that means 654 million dollars spent in Research and Development. I wonder if SAS has considered investing in much smaller startups (than it’s traditional strategy of doing all research in-house and completely acquiring a smaller company)
Even a small investment of say 5-10 million USD in open source , or even Phd level research projects could greatly increase the ROI on that.
That means
Analyzing a private company’s financials are much more fun than a public company, and I remember the words of my finance professor ( “dig , dig”) to compare 2011 results with 2010 results.
http://www.sas.com/news/preleases/2010financials.html
The percentage invested in R and D is exactly the same (24%) and the percentages of revenue earned from each geography is exactly the same . So even though revenue growth increased from 5.2 % to 9% in 2011, both the geographic spread of revenues and share R&D costs remained EXACTLY the same.
The Americas accounted for 46 percent of total revenue; Europe, Middle East and Africa (EMEA) 42 percent; and Asia Pacific 12 percent.
Overall, I think SAS remains a 35% market share (despite all that noise from IBM, SAS clones, open source) because they are good at providing solutions customized for industries (instead of just software products), the market for analytics is not saturated (it seems to be growing faster than 12% or is it) , and its ability to attract and retain the best analytical talent (which in a non -American tradition for a software company means no stock options, job security, and great benefits- SAS remains almost Japanese in HR practices).
In 2010, SAS grew staff by 2.4 percent, in 2011 SAS grew staff by 9 percent.
But I liked the directional statement made here-and I think that design interfaces, algorithmic and computational efficiencies should increase analytical time, time to think on business and reduce data management time further!
“What would you do with the extra time if your code ran in two minutes instead of five hours?” Goodnight challenged.
Changes in R software
The newest version of R is now available for download. R 2.13 is ready !!

http://cran.at.r-project.org/bin/windows/base/CHANGES.R-2.13.0.html
Windows-specific changes to R
CHANGES IN R VERSION 2.13.0
WINDOWS VERSION
- Windows 2000 is no longer supported. (It went end-of-life in July 2010.)
NEW FEATURES
win_iconvhas been updated: this version has a change in the behaviour with BOMs on UTF-16 and UTF-32 files – it removes BOMs when reading and adds them when writing. (This is consistent with Microsoft applications, but Unix versions oficonvusually ignore them.)- Support for repository type win64.binary (used for 64-bit Windows binaries for R 2.11.x only) has been removed.
- The installers no longer put an ‘Uninstall’ item on the start menu (to conform to current Microsoft UI guidelines).
- Running R always sets the environment variable R_ARCH (as it does on a Unix-alike from the shell-script front-end).
- The defaults for
options("browser")andoptions("pdfviewer")are now set from environment variables R_BROWSER and R_PDFVIEWER respectively (as on a Unix-alike). A value of"false"suppresses display (even if there is nofalse.exepresent on the path). - If
options("install.lock")is set toTRUE, binary package installs are protected against failure similar to the way source package installs are protected. file.exists()andunlink()have more support for files > 2GB.- The versions of
R.exein ‘R_HOME/bin/i386,x64/bin’ now support options such asR --vanilla CMD: there is no comparable interface for ‘Rcmd.exe’. - A few more file operations will now work with >2GB files.
- The environment variable R_HOME in an R session now uses slash as the path separator (as it always has when set by
Rcmd.exe). Rguihas a new menu item for the PDF ‘Sweave User Manual’.
DEPRECATED
- zip.unpack() is deprecated: use
unzip().
INSTALLATION
- There is support for libjpeg-turbo via setting
JPEGDIRto that value in ‘MkRules.local’.Support for jpeg-6b has been removed.
- The sources now work with libpng-1.5.1, jpegsrc.v8c (which are used in the CRAN builds) and tiff-4.0.0beta6 (CRAN builds use 3.9.1). It is possible that they no longer work with older versions than libpng-1.4.5.
BUG FIXES
- Workaround for the incorrect values given by Windows’
casinhfunction on the branch cuts. - Bug fixes for drawing raster objects on
windows(). The symptom was the occasional raster image not being drawn, especially when drawing multiple raster images in a single expression. Thanks to Michael Sumner for report and testing. - Printing extremely long string values could overflow the stack and cause the GUI to crash. (PR#14543)
Tonnes of changes!!
http://cran.at.r-project.org/src/base/NEWS
CHANGES IN R VERSION 2.13.0:
SIGNIFICANT USER-VISIBLE CHANGES:
• replicate() (by default) and vapply() (always) now return a
higher-dimensional array instead of a matrix in the case where
the inner function value is an array of dimension >= 2.
• Printing and formatting of floating point numbers is now using
the correct number of digits, where it previously rarely differed
by a few digits. (See “scientific†entry below.) This affects
_many_ *.Rout.save checks in packages.
NEW FEATURES:
• normalizePath() has been moved to the base package (from utils):
this is so it can be used by library() and friends.
It now does tilde expansion.
It gains new arguments winslash (to select the separator on
Windows) and mustWork to control the action if a canonical path
cannot be found.
• The previously barely documented limit of 256 bytes on a symbol
name has been raised to 10,000 bytes (a sanity check). Long
symbol names can sometimes occur when deparsing expressions (for
example, in model.frame).
• reformulate() gains a intercept argument.
• cmdscale(add = FALSE) now uses the more common definition that
there is a representation in n-1 or less dimensions, and only
dimensions corresponding to positive eigenvalues are used.
(Avoids confusion such as PR#14397.)
• Names used by c(), unlist(), cbind() and rbind() are marked with
an encoding when this can be ascertained.
• R colours are now defined to refer to the sRGB color space.
The PDF, PostScript, and Quartz graphics devices record this
fact. X11 (and Cairo) and Windows just assume that your screen
conforms.
• system.file() gains a mustWork argument (suggestion of Bill
Dunlap).
• new.env(hash = TRUE) is now the default.
• list2env(envir = NULL) defaults to hashing (with a suitably sized
environment) for lists of more than 100 elements.
• text() gains a formula method.
• IQR() now has a type argument which is passed to quantile().
• as.vector(), as.double() etc duplicate less when they leave the
mode unchanged but remove attributes.
as.vector(mode = "any") no longer duplicates when it does not
remove attributes. This helps memory usage in matrix() and
array().
matrix() duplicates less if data is an atomic vector with
attributes such as names (but no class).
dim(x) <- NULL duplicates less if x has neither dimensions nor
names (since this operation removes names and dimnames).
• setRepositories() gains an addURLs argument.
• chisq.test() now also returns a stdres component, for
standardized residuals (which have unit variance, unlike the
Pearson residuals).
• write.table() and friends gain a fileEncoding argument, to
simplify writing files for use on other OSes (e.g. a spreadsheet
intended for Windows or Mac OS X Excel).
• Assignment expressions of the form foo::bar(x) <- y and
foo:::bar(x) <- y now work; the replacement functions used are
foo::`bar<-` and foo:::`bar<-`.
• Sys.getenv() gains a names argument so Sys.getenv(x, names =
FALSE) can replace the common idiom of as.vector(Sys.getenv()).
The default has been changed to not name a length-one result.
• Lazy loading of environments now preserves attributes and locked
status. (The locked status of bindings and active bindings are
still not preserved; this may be addressed in the future).
• options("install.lock") may be set to FALSE so that
install.packages() defaults to --no-lock installs, or (on
Windows) to TRUE so that binary installs implement locking.
• sort(partial = p) for large p now tries Shellsort if quicksort is
not appropriate and so works for non-numeric atomic vectors.
• sapply() gets a new option simplify = "array" which returns a
“higher rank†array instead of just a matrix when FUN() returns a
dim() length of two or more.
replicate() has this option set by default, and vapply() now
behaves that way internally.
• aperm() becomes S3 generic and gets a table method which
preserves the class.
• merge() and as.hclust() methods for objects of class "dendrogram"
are now provided.
• as.POSIXlt.factor() now passes ... to the character method
(suggestion of Joshua Ulrich).
• The character method of as.POSIXlt() now tries to find a format
that works for all non-NA inputs, not just the first one.
• str() now has a method for class "Date" analogous to that for
class "POSIXt".
• New function file.link() to create hard links on those file
systems (POSIX, NTFS but not FAT) that support them.
• New Summary() group method for class "ordered" implements min(),
max() and range() for ordered factors.
• mostattributes<-() now consults the "dim" attribute and not the
dim() function, making it more useful for objects (such as data
frames) from classes with methods for dim(). It also uses
attr<-() in preference to the generics name<-(), dim<-() and
dimnames<-(). (Related to PR#14469.)
• There is a new option "browserNLdisabled" to disable the use of
an empty (e.g. via the ‘Return’ key) as a synonym for c in
browser() or n under debug(). (Wish of PR#14472.)
• example() gains optional new arguments character.only and
give.lines enabling programmatic exploration.
• serialize() and unserialize() are no longer described as
‘experimental’. The interface is now regarded as stable,
although the serialization format may well change in future
releases. (serialize() has a new argument version which would
allow the current format to be written if that happens.)
New functions saveRDS() and readRDS() are public versions of the
‘internal’ functions .saveRDS() and .readRDS() made available for
general use. The dot-name versions remain available as several
package authors have made use of them, despite the documentation.
saveRDS() supports compress = "xz".
• Many functions when called with a not-open connection will now
ensure that the connection is left not-open in the event of
error. These include read.dcf(), dput(), dump(), load(),
parse(), readBin(), readChar(), readLines(), save(), writeBin(),
writeChar(), writeLines(), .readRDS(), .saveRDS() and
tools::parse_Rd(), as well as functions calling these.
• Public functions find.package() and path.package() replace the
internal dot-name versions.
• The default method for terms() now looks for a "terms" attribute
if it does not find a "terms" component, and so works for model
frames.
• httpd() handlers receive an additional argument containing the
full request headers as a raw vector (this can be used to parse
cookies, multi-part forms etc.). The recommended full signature
for handlers is therefore function(url, query, body, headers,
...).
• file.edit() gains a fileEncoding argument to specify the encoding
of the file(s).
• The format of the HTML package listings has changed. If there is
more than one library tree , a table of links to libraries is
provided at the top and bottom of the page. Where a library
contains more than 100 packages, an alphabetic index is given at
the top of the section for that library. (As a consequence,
package names are now sorted case-insensitively whatever the
locale.)
• isSeekable() now returns FALSE on connections which have
non-default encoding. Although documented to record if ‘in
principle’ the connection supports seeking, it seems safer to
report FALSE when it may not work.
• R CMD REMOVE and remove.packages() now remove file R.css when
removing all remaining packages in a library tree. (Related to
the wish of PR#14475: note that this file is no longer
installed.)
• unzip() now has a unzip argument like zip.file.extract(). This
allows an external unzip program to be used, which can be useful
to access features supported by Info-ZIP's unzip version 6 which
is now becoming more widely available.
• There is a simple zip() function, as wrapper for an external zip
command.
• bzfile() connections can now read from concatenated bzip2 files
(including files written with bzfile(open = "a")) and files
created by some other compressors (such as the example of
PR#14479).
• The primitive function c() is now of type BUILTIN.
• plot(<dendrogram>, .., nodePar=*) now obeys an optional xpd
specification (allowing clipping to be turned off completely).
• nls(algorithm="port") now shares more code with nlminb(), and is
more consistent with the other nls() algorithms in its return
value.
• xz has been updated to 5.0.1 (very minor bugfix release).
• image() has gained a logical useRaster argument allowing it to
use a bitmap raster for plotting a regular grid instead of
polygons. This can be more efficient, but may not be supported by
all devices. The default is FALSE.
• list.files()/dir() gains a new argument include.dirs() to include
directories in the listing when recursive = TRUE.
• New function list.dirs() lists all directories, (even empty
ones).
• file.copy() now (by default) copies read/write/execute
permissions on files, moderated by the current setting of
Sys.umask().
• Sys.umask() now accepts mode = NA and returns the current umask
value (visibly) without changing it.
• There is a ! method for classes "octmode" and "hexmode": this
allows xor(a, b) to work if both a and b are from one of those
classes.
• as.raster() no longer fails for vectors or matrices containing
NAs.
• New hook "before.new.plot" allows functions to be run just before
advancing the frame in plot.new, which is potentially useful for
custom figure layout implementations.
• Package tools has a new function compactPDF() to try to reduce
the size of PDF files _via_ qpdf or gs.
• tar() has a new argument extra_flags.
• dotchart() accepts more general objects x such as 1D tables which
can be coerced by as.numeric() to a numeric vector, with a
warning since that might not be appropriate.
• The previously internal function create.post() is now exported
from utils, and the documentation for bug.report() and
help.request() now refer to that for create.post().
It has a new method = "mailto" on Unix-alikes similar to that on
Windows: it invokes a default mailer via open (Mac OS X) or
xdg-open or the default browser (elsewhere).
The default for ccaddress is now getOption("ccaddress") which is
by default unset: using the username as a mailing address
nowadays rarely works as expected.
• The default for options("mailer") is now "mailto" on all
platforms.
• unlink() now does tilde-expansion (like most other file
functions).
• file.rename() now allows vector arguments (of the same length).
• The "glm" method for logLik() now returns an "nobs" attribute
(which stats4::BIC() assumed it did).
The "nls" method for logLik() gave incorrect results for zero
weights.
• There is a new generic function nobs() in package stats, to
extract from model objects a suitable value for use in BIC
calculations. An S4 generic derived from it is defined in
package stats4.
• Code for S4 reference-class methods is now examined for possible
errors in non-local assignments.
• findClasses, getGeneric, findMethods and hasMethods are revised
to deal consistently with the package= argument and be consistent
with soft namespace policy for finding objects.
• tools::Rdiff() now has the option to return not only the status
but a character vector of observed differences (which are still
by default sent to stdout).
• The startup environment variables R_ENVIRON_USER, R_ENVIRON,
R_PROFILE_USER and R_PROFILE are now treated more consistently.
In all cases an empty value is considered to be set and will stop
the default being used, and for the last two tilde expansion is
performed on the file name. (Note that setting an empty value is
probably impossible on Windows.)
• Using R --no-environ CMD, R --no-site-file CMD or R
--no-init-file CMD sets environment variables so these settings
are passed on to child R processes, notably those run by INSTALL,
check and build. R --vanilla CMD sets these three options (but
not --no-restore).
• smooth.spline() is somewhat faster. With cv=NA it allows some
leverage computations to be skipped,
• The internal (C) function scientific(), at the heart of R's
format.info(x), format(x), print(x), etc, for numeric x, has been
re-written in order to provide slightly more correct results,
fixing PR#14491, notably in border cases including when digits >=
16, thanks to substantial contributions (code and experiments)
from Petr Savicky. This affects a noticable amount of numeric
output from R.
• A new function grepRaw() has been introduced for finding subsets
of raw vectors. It supports both literal searches and regular
expressions.
• Package compiler is now provided as a standard package. See
?compiler::compile for information on how to use the compiler.
This package implements a byte code compiler for R: by default
the compiler is not used in this release. See the ‘R
Installation and Administration Manual’ for how to compile the
base and recommended packages.
• Providing an exportPattern directive in a NAMESPACE file now
causes classes to be exported according to the same pattern, for
example the default from package.skeleton() to specify all names
starting with a letter. An explicit directive to
exportClassPattern will still over-ride.
• There is an additional marked encoding "bytes" for character
strings. This is intended to be used for non-ASCII strings which
should be treated as a set of bytes, and never re-encoded as if
they were in the encoding of the currrent locale: useBytes = TRUE
is autmatically selected in functions such as writeBin(),
writeLines(), grep() and strsplit().
Only a few character operations are supported (such as substr()).
Printing, format() and cat() will represent non-ASCII bytes in
such strings by a \xab escape.
• The new function removeSource() removes the internally stored
source from a function.
• "srcref" attributes now include two additional line number
values, recording the line numbers in the order they were parsed.
• New functions have been added for source reference access:
getSrcFilename(), getSrcDirectory(), getSrcLocation() and
getSrcref().
• Sys.chmod() has an extra argument use_umask which defaults to
true and restricts the file mode by the current setting of umask.
This means that all the R functions which manipulate
file/directory permissions by default respect umask, notably R
CMD INSTALL.
• tempfile() has an extra argument fileext to create a temporary
filename with a specified extension. (Suggestion and initial
implementation by Dirk Eddelbuettel.)
There are improvements in the way Sweave() and Stangle() handle
non-ASCII vignette sources, especially in a UTF-8 locale: see
‘Writing R Extensions’ which now has a subsection on this topic.
• factanal() now returns the rotation matrix if a rotation such as
"promax" is used, and hence factor correlations are displayed.
(Wish of PR#12754.)
• The gctorture2() function provides a more refined interface to
the GC torture process. Environment variables R_GCTORTURE,
R_GCTORTURE_WAIT, and R_GCTORTURE_INHIBIT_RELEASE can also be
used to control the GC torture process.
• file.copy(from, to) no longer regards it as an error to supply a
zero-length from: it now simply does nothing.
• rstandard.glm gains a type argument which can be used to request
standardized Pearson residuals.
• A start on a Turkish translation, thanks to Murat Alkan.
• .libPaths() calls normalizePath(winslash = "/") on the paths:
this helps (usually) present them in a user-friendly form and
should detect duplicate paths accessed via different symbolic
links.
SWEAVE CHANGES:
• Sweave() has options to produce PNG and JPEG figures, and to use
a custom function to open a graphics device (see ?RweaveLatex).
(Based in part on the contribution of PR#14418.)
• The default for Sweave() is to produce only PDF figures (rather
than both EPS and PDF).
• Environment variable SWEAVE_OPTIONS can be used to supply
defaults for existing or new options to be applied after the
Sweave driver setup has been run.
• The Sweave manual is now included as a vignette in the utils
package.
• Sweave() handles keep.source=TRUE much better: it could duplicate
some lines and omit comments. (Reported by John Maindonald and
others.)
C-LEVEL FACILITIES:
• Because they use a C99 interface which a C++ compiler is not
required to support, Rvprintf and REvprintf are only defined by
R_ext/Print.h in C++ code if the macro R_USE_C99_IN_CXX is
defined when it is included.
• pythag duplicated the C99 function hypot. It is no longer
provided, but is used as a substitute for hypot in the very
unlikely event that the latter is not available.
• R_inspect(obj) and R_inspect3(obj, deep, pvec) are (hidden)
C-level entry points to the internal inspect function and can be
used for C-level debugging (e.g., in conjunction with the p
command in gdb).
• Compiling R with --enable-strict-barrier now also enables
additional checking for use of unprotected objects. In
combination with gctorture() or gctorture2() and a C-level
debugger this can be useful for tracking down memory protection
issues.
UTILITIES:
• R CMD Rdiff is now implemented in R on Unix-alikes (as it has
been on Windows since R 2.12.0).
• R CMD build no longer does any cleaning in the supplied package
directory: all the cleaning is done in the copy.
It has a new option --install-args to pass arguments to R CMD
INSTALL for --build (but not when installing to rebuild
vignettes).
There is new option, --resave-data, to call
tools::resaveRdaFiles() on the data directory, to compress
tabular files (.tab, .csv etc) and to convert .R files to .rda
files. The default, --resave-data=gzip, is to do so in a way
compatible even with years-old versions of R, but better
compression is given by --resave-data=best, requiring R >=
2.10.0.
It now adds a datalist file for data directories of more than
1Mb.
Patterns in .Rbuildignore are now also matched against all
directory names (including those of empty directories).
There is a new option, --compact-vignettes, to try reducing the
size of PDF files in the inst/doc directory. Currently this
tries qpdf: other options may be used in future.
When re-building vignettes and a inst/doc/Makefile file is found,
make clean is run if the makefile has a clean: target.
After re-building vignettes the default clean-up operation will
remove any directories (and not just files) created during the
process: e.g. one package created a .R_cache directory.
Empty directories are now removed unless the option
--keep-empty-dirs is given (and a few packages do deliberately
include empty directories).
If there is a field BuildVignettes in the package DESCRIPTION
file with a false value, re-building the vignettes is skipped.
• R CMD check now also checks for filenames that are
case-insensitive matches to Windows' reserved file names with
extensions, such as nul.Rd, as these have caused problems on some
Windows systems.
It checks for inefficiently saved data/*.rda and data/*.RData
files, and reports on those large than 100Kb. A more complete
check (including of the type of compression, but potentially much
slower) can be switched on by setting environment variable
_R_CHECK_COMPACT_DATA2_ to TRUE.
The types of files in the data directory are now checked, as
packages are _still_ misusing it for non-R data files.
It now extracts and runs the R code for each vignette in a
separate directory and R process: this is done in the package's
declared encoding. Rather than call tools::checkVignettes(), it
calls tool::buildVignettes() to see if the vignettes can be
re-built as they would be by R CMD build. Option --use-valgrind
now applies only to these runs, and not when running code to
rebuild the vignettes. This version does a much better job of
suppressing output from successful vignette tests.
The 00check.log file is a more complete record of what is output
to stdout: in particular contains more details of the tests.
It now check all syntactically valid Rd usage entries, and warns
about assignments (unless these give the usage of replacement
functions).
.tar.xz compressed tarballs are now allowed, if tar supports them
(and setting environment variable TAR to internal ensures so on
all platforms).
• R CMD check now warns if it finds inst/doc/makefile, and R CMD
build renames such a file to inst/doc/Makefile.
INSTALLATION:
• Installing R no longer tries to find perl, and R CMD no longer
tries to substitute a full path for awk nor perl - this was a
legacy from the days when they were used by R itself. Because a
couple of packages do use awk, it is set as the make (rather than
environment) variable AWK.
• make check will now fail if there are differences from the
reference output when testing package examples and if environment
variable R_STRICT_PACKAGE_CHECK is set to a true value.
• The C99 double complex type is now required.
The C99 complex trigonometric functions (such as csin) are not
currently required (FreeBSD lacks most of them): substitutes are
used if they are missing.
• The C99 system call va_copy is now required.
• If environment variable R_LD_LIBRARY_PATH is set during
configuration (for example in config.site) it is used unchanged
in file etc/ldpaths rather than being appended to.
• configure looks for support for OpenMP and if found compiles R
with appropriate flags and also makes them available for use in
packages: see ‘Writing R Extensions’.
This is currently experimental, and is only used in R with a
single thread for colSums() and colMeans(). Expect it to be more
widely used in later versions of R.
This can be disabled by the --disable-openmp flag.
PACKAGE INSTALLATION:
• R CMD INSTALL --clean now removes copies of a src directory which
are created when multiple sub-architectures are in use.
(Following a comment from Berwin Turlach.)
• File R.css is now installed on a per-package basis (in the
package's html directory) rather than in each library tree, and
this is used for all the HTML pages in the package. This helps
when installing packages with static HTML pages for use on a
webserver. It will also allow future versions of R to use
different stylesheets for the packages they install.
• A top-level file .Rinstignore in the package sources can list (in
the same way as .Rbuildignore) files under inst that should not
be installed. (Why should there be any such files? Because all
the files needed to re-build vignettes need to be under inst/doc,
but they may not need to be installed.)
• R CMD INSTALL has a new option --compact-docs to compact any PDFs
under the inst/doc directory. Currently this uses qpdf, which
must be installed (see ‘Writing R Extensions’).
• There is a new option --lock which can be used to cancel the
effect of --no-lock or --pkglock earlier on the command line.
• Option --pkglock can now be used with more than one package, and
is now the default if only one package is specified.
• Argument lock of install.packages() can now be use for Mac binary
installs as well as for Windows ones. The value "pkglock" is now
accepted, as well as TRUE and FALSE (the default).
• There is a new option --no-clean-on-error for R CMD INSTALL to
retain a partially installed package for forensic analysis.
• Packages with names ending in . are not portable since Windows
does not work correctly with such directory names. This is now
warned about in R CMD check, and will not be allowed in R 2.14.x.
• The vignette indices are more comprehensive (in the style of
browseVignetttes()).
DEPRECATED & DEFUNCT:
• require(save = TRUE) is defunct, and use of the save argument is
deprecated.
• R CMD check --no-latex is defunct: use --no-manual instead.
• R CMD Sd2Rd is defunct.
• The gamma argument to hsv(), rainbow(), and rgb2hsv() is
deprecated and no longer has any effect.
• The previous options for R CMD build --binary (--auto-zip,
--use-zip-data and --no-docs) are deprecated (or defunct): use
the new option --install-args instead.
• When a character value is used for the EXPR argument in switch(),
only a single unnamed alternative value is now allowed.
• The wrapper utils::link.html.help() is no longer available.
• Zip-ing data sets in packages (and hence R CMD INSTALL options
--use-zip-data and --auto-zip, as well as the ZipData: yes field
in a DESCRIPTION file) is defunct.
Installed packages with zip-ed data sets can still be used, but a
warning that they should be re-installed will be given.
• The ‘experimental’ alternative specification of a name space via
.Export() etc is now defunct.
• The option --unsafe to R CMD INSTALL is deprecated: use the
identical option --no-lock instead.
• The entry point pythag in Rmath.h is deprecated in favour of the
C99 function hypot. A wrapper for hypot is provided for R 2.13.x
only.
• Direct access to the "source" attribute of functions is
deprecated; use deparse(fn, control="useSource") to access it,
and removeSource(fn) to remove it.
• R CMD build --binary is now formally deprecated: R CMD INSTALL
--build has long been the preferred alternative.
• Single-character package names are deprecated (and R is already
disallowed to avoid confusion in Depends: fields).
BUG FIXES:
• drop.terms and the [ method for class "terms" no longer add back
an intercept. (Reported by Niels Hansen.)
• aggregate preserves the class of a column (e.g. a date) under
some circumstances where it discarded the class previously.
• p.adjust() now always returns a vector result, as documented. In
previous versions it copied attributes (such as dimensions) from
the p argument: now it only copies names.
• On PDF and PostScript devices, a line width of zero was recorded
verbatim and this caused problems for some viewers (a very thin
line combined with a non-solid line dash pattern could also cause
a problem). On these devices, the line width is now limited at
0.01 and for very thin lines with complex dash patterns the
device may force the line dash pattern to be solid. (Reported by
Jari Oksanen.)
• The str() method for class "POSIXt" now gives sensible output for
0-length input.
• The one- and two-argument complex maths functions failed to warn
if NAs were generated (as their numeric analogues do).
• Added .requireCachedGenerics to the dont.mind list for library()
to avoid warnings about duplicates.
• $<-.data.frame messed with the class attribute, breaking any S4
subclass. The S4 data.frame class now has its own $<- method,
and turns dispatch on for this primitive.
• Map() did not look up a character argument f in the correct
frame, thanks to lazy evaluation. (PR#14495)
• file.copy() did not tilde-expand from and to when to was a
directory. (PR#14507)
• It was possible (but very rare) for the loading test in R CMD
INSTALL to crash a child R process and so leave around a lock
directory and a partially installed package. That test is now
done in a separate process.
• plot(<formula>, data=<matrix>,..) now works in more cases;
similarly for points(), lines() and text().
• edit.default() contained a manual dispatch for matrices (the
"matrix" class didn't really exist when it was written). This
caused an infinite recursion in the no-GUI case and has now been
removed.
• data.frame(check.rows = TRUE) sometimes worked when it should
have detected an error. (PR#14530)
• scan(sep= , strip.white=TRUE) sometimes stripped trailing spaces
from within quoted strings. (The real bug in PR#14522.)
• The rank-correlation methods for cor() and cov() with use =
"complete.obs" computed the ranks before removing missing values,
whereas the documentation implied incomplete cases were removed
first. (PR#14488)
They also failed for 1-row matrices.
• The perpendicular adjustment used in placing text and expressions
in the margins of plots was not scaled by par("mex"). (Part of
PR#14532.)
• Quartz Cocoa device now catches any Cocoa exceptions that occur
during the creation of the device window to prevent crashes. It
also imposes a limit of 144 ft^2 on the area used by a window to
catch user errors (unit misinterpretation) early.
• The browser (invoked by debug(), browser() or otherwise) would
display attributes such as "wholeSrcref" that were intended for
internal use only.
• R's internal filename completion now properly handles filenames
with spaces in them even when the readline library is used. This
resolves PR#14452 provided the internal filename completion is
used (e.g., by setting rc.settings(files = TRUE)).
• Inside uniroot(f, ...), -Inf function values are now replaced by
a maximally *negative* value.
• rowsum() could silently over/underflow on integer inputs
(reported by Bill Dunlap).
• as.matrix() did not handle "dist" objects with zero rows.
CHANGES IN R VERSION 2.12.2 patched:
NEW FEATURES:
• max() and min() work harder to ensure that NA has precedence over
NaN, so e.g. min(NaN, NA) is NA. (This was not previously
documented except for within a single numeric vector, where
compiler optimizations often defeated the code.)
BUG FIXES:
• A change to the C function R_tryEval had broken error messages in
S4 method selection; the error message is now printed.
• PDF output with a non-RGB color model used RGB for the line
stroke color. (PR#14511)
• stats4::BIC() assumed without checking that an object of class
"logLik" has an "nobs" attribute: glm() fits did not and so BIC()
failed for them.
• In some circumstances a one-sided mantelhaen.test() reported the
p-value for the wrong tail. (PR#14514)
• Passing the invalid value lty = NULL to axis() sent an invalid
value to the graphics device, and might cause the device to
segfault.
• Sweave() with concordance=TRUE could lead to invalid PDF files;
Sweave.sty has been updated to avoid this.
• Non-ASCII characters in the titles of help pages were not
rendered properly in some locales, and could cause errors or
warnings. • checkRd() gave a spurious error if the \href macro was used.









