The 3.0 Era for R starts today! Changes include better Big Data support.
Read the NEWS here
install.packages()has a new argument
quietto reduce the amount of output shown.
- New functions
citeNatbib()have been added, to allow generation of in-text citations from
cite()function may be added to
merge()works in more cases where the data frames include matrices. (Wish of PR#14974.)
sample.int()has some support for n >= 2^31: see its help for the limitations.A different algorithm is used for
(n, size, replace = FALSE, prob = NULL)for
n > 1e7and
size <= n/2. This is much faster and uses less memory, but does give different results.
dir()) gains a new optional argument
no..which allows to exclude
- Profiling via
Rprof()now optionally records information at the statement level, not just the function level.
"license/restricts_use"filter which retains only packages for which installation can proceed solely based on packages which are guaranteed not to restrict use.
- File ‘share/licenses/licenses.db’ has some clarifications, especially as to which variants of ‘BSD’ and ‘MIT’ is intended and how to apply them to packages. The problematic licence ‘Artistic-1.0’ has been removed.
hist.default()can now be a function that returns the breakpoints to be used (previously it could only return the suggested number of breakpoints).
This section applies only to 64-bit platforms.
- There is support for vectors longer than 2^31 – 1 elements. This applies to raw, logical, integer, double, complex and character vectors, as well as lists. (Elements of character vectors remain limited to 2^31 – 1 bytes.)
- Most operations which can sensibly be done with long vectors work: others may return the error ‘long vectors not supported yet’. Most of these are because they explicitly work with integer indices (e.g.
match()) or because other limits (e.g. of character strings or matrix dimensions) would be exceeded or the operations would be extremely slow.
length()returns a double for long vectors, and lengths can be set to 2^31 or more by the replacement function with a double value.
- Most aspects of indexing are available. Generally double-valued indices can be used to access elements beyond 2^31 – 1.
- There is some support for matrices and arrays with each dimension less than 2^31 but total number of elements more than that. Only some aspects of matrix algebra work for such matrices, often taking a very long time. In other cases the underlying Fortran code has an unstated restriction (as was found for complex
dist()can produce dissimilarity objects for more than 65536 rows (but for example
hclust()cannot process such objects).
serialize()to a raw vector is unlimited in size (except by resources).
- The C-level function
R_alloccan now allocate 2^35 or more bytes.
grep()will return double vectors of indices for long vector inputs.
- Many calls to
.C()have been replaced by
.Call()to allow long vectors to be supported (now or in the future). Regrettably several packages had copied the non-API
.C()calls and so failed.
.Fortran()do not accept long vector inputs. This is a precaution as it is very unlikely that existing code will have been written to handle long vectors (and the R wrappers often assume that
length(x)is an integer).
- Most of the methods for
sort()work for long vectors.
order()support long vectors (slowly except for radix sorting).
sample()can do uniform sampling from a long vector.
- More use has been made of R objects representing registered entry points, which is more efficient as the address is provided by the loader once only when the package is loaded.
This has been done for packages
tcltk: it was already in place for the other standard packages.
Since these entry points are always accessed by the R entry points they do not need to be in the load table which can be substantially smaller and hence searched faster. This does mean that
.Callcalls copied from earlier versions of R may no longer work – but they were never part of the API.
.Call()calls in package base have been migrated to
solve()makes fewer copies, especially when
bis a vector rather than a matrix.
eigen()makes fewer copies if the input has dimnames.
- Most of the linear algebra functions make fewer copies when the input(s) are not double (e.g. integer or logical).
- A foreign function call (
.C()etc) in a package without a
PACKAGEargument will only look in the first DLL specified in the ‘NAMESPACE’ file of the package rather than searching all loaded DLLs. A few packages needed
@<-operator is now implemented as a primitive, which should reduce some copying of objects when used. Note that the operator object must now be in package base: do not try to import it explicitly from package methods.
SIGNIFICANT USER-VISIBLE CHANGES
- Packages need to be (re-)installed under this version (3.0.0) of R.
- There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as
NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and
x[2^31] <- ywill now extend the vector on a 64-bit platform and give an error on a 32-bit one.
- It is now possible for 64-bit builds to allocate amounts of memory limited only by the OS. It may be wise to use OS facilities (e.g.
csh), to set limits on overall memory consumption of an R process, particularly in a multi-user environment. A number of packages need a limit of at least 4GB of virtual memory to load.
64-bit Windows builds of R are by default limited in memory usage to the amount of RAM installed: this limit can be changed by command-line option –max-mem-size or setting environment variable R_MAX_MEM_SIZE.