STEM is cool

Lady Gaga holding a speech at National Equalit...
Image via Wikipedia

A good video created by my favorite social media people from a company in North Carolina.

STEM is cool (Science Technology Engineering Maths?)

No, Science is not kool aid- it is just COOL. and better paying than watching Justin Bieber or Lady Gaga videos. Get those lazy teenagers out of Glee clubs and back into Science clubs.

The video itself-

Disclaimer- I have no direct or indirect  financial relationship with the creators of this video. I think it is cool people express creativity in positive ways to help their favorite software,company, and even the world. Blah Blah Blah 🙂

Yeah, STEM is cool again.

 

 

Nice BI Tutorials

Tutorials screenshot.
Image via Wikipedia

Here is a set of very nice, screenshot enabled tutorials from SAP BI. They are a bit outdated (3 years old) but most of it is quite relevant- especially from a Tutorial Design Perspective –

Most people would rather see screenshot based step by step powerpoints, than cluttered or clever presentations , or even videos that force you to sit like a TV zombie. Unfortunately most tutorial presentations I see especially for BI are either slides with one or two points, that abruptly shift to “concepts” or videos that are atleast more than 10 minutes long. That works fine for scripting tutorials or hands on workshops, but cannot be reproduced for later instances of study.

The mode of tutorials especially for GUI software can vary, it may be Slideshare, Scribd, Google Presentation,Microsoft Powerpoint but a step by step screenshot by screenshot tutorial is much better for understanding than commando line jargon/ Youtub   Videos presentations, or Powerpoint with Points.

Have a look at these SAP BI 7 slideshares

and

Speaking of BI, the R Package called Brew is going to brew up something special especially combined with R Apache. However I wish R Apache, or R Web, or RServe had step by step install screenshot tutorials to increase their usage in Business Intelligence.

I tried searching for JMP GUI Tutorials too, but I believe putting all your content behind a registration wall is not so great. Do a Pareto Analysis of your training material, surely you can share a couple more tutorials without registration. It also will help new wanna-migrate users to get a test and feel for the installation complexities as well as final report GUI.

 

Interview James Dixon Pentaho

Here is an interview with James Dixon the founder of Pentaho, self confessed Chief Geek and CTO. Pentaho has been growing very rapidly and it makes open source Business Intelligence solutions- basically the biggest chunk of enterprise software market currently.

Ajay-  How would you describe Pentaho as a BI product for someone who is completely used to traditional BI vendors (read non open source). Do the Oracle lawsuits over Java bother you from a business perspective?

James-

Pentaho has a full suite of BI software:

* ETL: Pentaho Data Integration

* Reporting: Pentaho Reporting for desktop and web-based reporting

* OLAP: Mondrian ROLAP engine, and Analyzer or Jpivot for web-based OLAP client

* Dashboards: CDF and Dashboard Designer

* Predictive Analytics: Weka

* Server: Pentaho BI Server, handles web-access, security, scheduling, sharing, report bursting etc

We have all of the standard BI functionality.

The Oracle/Java issue does not bother me much. There are a lot of software companies dependent on Java. If Oracle abandons Java a lot resources will suddenly focus on OpenJDK. It would be good for OpenJDK and might be the best thing for Java in the long term.

Ajay-  What parts of Pentaho’s technology do you personally like the best as having an advantage over other similar proprietary packages.

Describe the latest Pentaho for Hadoop offering and Hadoop/HIVE ‘s advantage over say Map Reduce and SQL.

James- The coolest thing is that everything is pluggable:

* ETL: New data transformation steps can be added. New orchestration controls (job entries) can be added. New perspectives can be added to the design UI. New data sources and destinations can be added.

* Reporting: New content types and report objects can be added. New data sources can be added.

* BI Server: Every factory, engine, and layer can be extended or swapped out via configuration. BI components can be added. New visualizations can be added.

This means it is very easy for Pentaho, partners, customers, and community member to extend our software to do new things.

In addition every engine and component can be fully embedded into a desktop or web-based application. I made a youtube video about our philosophy: http://www.youtube.com/watch?v=uMyR-In5nKE

Our Hadoop offerings allow ETL developers to work in a familiar graphical design environment, instead of having to code MapReduce jobs in Java or Python.

90% of the Hadoop use cases we hear about are transformation/reporting/analysis of structured/semi-structured data, so an ETL tool is perfect for these situations.

Using Pentaho Data Integration reduces implementation and maintenance costs significantly. The fact that our ETL engine is Java and is embeddable means that we can deploy the engine to the Hadoop data nodes and transform the data within the nodes.

Ajay-  Do you think the combination of recession, outsourcing,cost cutting, and unemployment are a suitable environment for companies to cut technology costs by going out of their usual vendor lists and try open source for a change /test projects.

Jamie- Absolutely. Pentaho grew (downloads, installations, revenue) throughout the recession. We are on target to do 250% of what we did last year, while the established vendors are flat in terms of new license revenue.

Ajay-  How would you compare the user interface of reports using Pentaho versus other reporting software. Please feel free to be as specific.

James- We have all of the everyday, standard reporting features covered.

Over the years the old tools, like Crystal Reports, have become bloated and complicated.

We don’t aim to have 100% of their features, because we’d end us just as complicated.

The 80:20 rule applies here. 80% of the time people only use 20% of their features.

We aim for 80% feature parity, which should cover 95-99% of typical use cases.

Ajay-  Could you describe the Pentaho integration with R as well as your relationship with Weka. Jaspersoft already has a partnership with Revolution Analytics for RevoDeployR (R on a web server)-

Any  R plans for Pentaho as well?

James- The feature set of R and Weka overlap to a small extent – both of them include basic statistical functions. Weka is focused on predictive models and machine learning, whereas R is focused on a full suite of statistical models. The creator and main Weka developer is a Pentaho employee. We have integrated R into our ETL tool. (makes me happy 🙂 )

(probably not a good time to ask if SAS integration is done as well for a big chunk of legacy base SAS/ WPS users)

About-

As “Chief Geek” (CTO) at Pentaho, James Dixon is responsible for Pentaho’s architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.

Business Intelligence and Stat Computing: The White Man's Last Stand

Unknown White Male
Image via Wikipedia

Name an industry in which top level executives are mostly white males, new recruits are mostly male (white or Indian/Chinese), women are primarily shunted into publicity relationships, social media or marketing.

Statistical Computing And Business Intelligence are the white man’s last stand to preserve an exclusive club of hail fellow well met and lets catch up after drinks culture. Newer startups are the exception in the business intelligence world , but  a whiter face helps (so do an Indian or Chinese male) to attract a mostly male white venture capital industry.

I have earlier talked about technology being totally dominated by Asian males at grad student level and ASA membership almost not representing minorities like blacks and yes women- but this is about corporate culture in the traditional BI world.

If you are connected to the BI or Stat Computing world, who would you rather hire AND who have you actually hired- with identical resumes

White Male or White Female or Brown Indian Male/Female or Yellow Male/Female or Black Male or Black Female

How many Black Grad Assistants do you see in tech corridors- (Nah- it is easier to get a  hard working Chinese /Indian- who smiles and does a great job at $12/hour)

How many non- Asian non white Authors do you see in technology and does that compare to pie chart below


racist image Pictures, Images and Photos

Note_ 2010 Census numbers arent available for STEM, and I was unable to find ethnic background for various technology companies, because though these numbers are collected for legal purposes, they are not publicly shared.

Any technology company which has more than 40% women , or more than 10% blacks would be fairly representative to the US population. Anecdotal evidence suggests European employment for minorities is worse (especially for Asians) but better for women.

Any data sources to support/ refute these hypothesis are welcome for purposes of scientific inquiry.

racist math image Pictures, Images and Photos

Cisco SocialMiner

A highly simplified version of the RSS feed ic...
Image via Wikipedia

A new product from Cisco to mine social media for analytics on sentiment-

http://www.cisco.com/en/US/products/ps11349/index.html

Cisco SocialMiner is a social media customer care solution that can help you proactively respond to customers and prospects communicating through public social media networks like Twitter, Facebook, or other public forums or blogging sites. By providing social media monitoring, queuing, and workflow to organize customer posts on social media networks and deliver them to your social media customer care team, your company can respond to customers in real time using the same social network they are using.

Cisco SocialMiner provides:

  • The ability to configure multiple campaigns to search for customer postings on the public social web about your company’s products, services, or area of expertise
  • Filtering of social contacts based on preconfigured campaign filters to focus campaign searches
  • Routing of social contacts to skilled customer care representatives in the contact center or to experts in the enterprise–multiple people can work together to handle responses to customer postings through shared work queues
  • Detailed metrics for social media customer care activities, campaign reports, and team reports

With Cisco SocialMiner, your company can listen and respond to customer conversations originating in the social web. Being proactive can help your company enhance its service, improve customer loyalty, garner new customers, and protect your brand.

Table 1. Features and Benefits of Cisco SocialMiner 8.5

Feature Benefits
Product Baseline Features
Social media feeds

• Feeds are configurable sources to capture public social contacts that contain specific words, terms, or phrases.

• Feeds enable you to search for information on the public social web about your company’s products, services, or area of expertise.

• Cisco SocialMiner supports the following types of feeds:

• Facebook

• Twitter
Campaign management

• Groups feeds into campaigns to organize all posting activity related to a product category or business objective

• Produces metrics on campaign activity

• Provides the ability to configure multiple campaigns to search for customer postings on specific products or services

• Groups social contacts for handling by the social media customer care team

• Enables filtering of social contacts based on preconfigured campaign filters to focus campaign searches
Route and queue social contacts

• Enables routing of social contacts to skilled customer care representatives in the contact center

• Draws on expertise in the enterprise by allowing multiple people in the enterprise to work together to handle responses to customer postings through shared work queues

• Enables automated distribution of work to improve efficiency and effectiveness of social media engagement
Tagging

• Allows work to be routed to the appropriate team by grouping each post or social contact into different categories; for example, a post can be marked with the “customer_support” tag; this post will then appear on a customer support agent’s queue for processing
Social media customer care metrics

• Provides detailed metrics on social media customer care activities, campaign reports, and team reports

• Measures work and results

• Manages to service-level goals

• Supports brand management

• Optimizes staffing

• Includes dashboarding of social media posting activity when Cisco Unified Intelligence Center is used
Reporting for social contacts

• Provides a reporting database that can be accessed using any reporting tool, including Cisco Unified Intelligence Center

• Enables customer care management to accurately report on and track social media interactions by the contact center
OpenSocial-compliant gadgets

Representational State Transfer (REST) application programming interfaces (APIs)

• Provides flexible user interface options

• Enables extensive opportunities for customization
Optional integration with full suite of Cisco Collaboration tools

• Allows you to take advantage of the full suite of Cisco Collaboration tools, including Cisco Quad, Cisco Show and Share, and Cisco Pulse technology, to help your social media customer care team quickly find answers to help customers efficiently and effectively

• Easy to maintain with existing IT personnel
Operating Environment
Cisco Unified Computing System(UCS) C-Series or B-Series Servers

• Requires a Cisco UCS C-Series or B-Series Server.

• Server consolidation means lower cost per server with Cisco UCS Servers.
Architecture
Scalability

• One server supports up to 30 simultaneous social media customer care users and 10,000 social contacts per hour.
Management
Cisco Unified Real-Time Monitoring Tool (RTMT)

• Operational management is enhanced through integration with the Cisco Unified RTMT, providing consistent application monitoring across Cisco Unified Communications Solutions.
Simple Network Management Protocol (SNMP)

• SNMP with an associated MIB is supported through the Cisco Voice Operating System (VOS).
Reporting
Cisco Unified Intelligence Center

• Create customizable reports of social media customer care events using Cisco Unified Intelligence Center (purchased separately).

 

 

Jim Goodnight on Open Source- and why he is right -sigh

Logo Open Source Initiative
Image via Wikipedia

Jim Goodnight – grand old man and Godfather of the Cosa Nostra of the BI/Database Analytics software industry said recently on open source in BI (btw R is generally termed in business analytics and NOT business intelligence software so these remarks were more apt to Pentaho and Jaspersoft )

Asked whether open source BI and data integration software from the likes of Jaspersoft, Pentaho and Talend is a growing threat, [Goodnight] said: “We haven’t noticed that a lot. Most of our companies need industrial strength software that has been tested, put through every possible scenario or failure to make sure everything works correctly.”

quotes from Jim Goodnight are courtesy Jason’s  story here:
http://www.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal

and the Pentaho follow-up reaction is here

http://bi.cbronline.com/news/pentaho-fires-back-across-sas-bows-over-limited-open-source-appeal

 

 

While you can rage and screech- here is the reality in terms of market share-

From Merv Adrian-‘s excellent article on market shares in BI

http://www.enterpriseirregulars.com/22444/decoding-bi-market-share-numbers-%E2%80%93-play-sudoku-with-analysts/

The first, labeled BI Platforms, is drawn fromGartner Market Share Analysis: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009, published May 2010 , and Gartner Dataquest Market Share: Business Intelligence, Analytics and Performance Management Software, Worldwide, 2009.

and

Advanced Analytics category.

and 

so whats the performance of Talend, Pentaho and Jaspersoft

From http://www.dbms2.com/category/products-and-vendors/talend/

It seems that Talend’s revenue was somewhat shy of $10 million in 2008.

and Talend itself says

http://www.talend.com/press/Talend-Announces-Record-2009-and-Continues-Growth-in-the-New-Year.php

Additional 2009 highlights include:

  • Achieved record revenue, more then doubling from 2008. The fourth quarter of 2009 was Talend’s tenth consecutive quarter of growth.
  • Grew customer base by 140% to over 1,000 customers, up from 420 at the end of 2008. Of these new customers, over 50% are Fortune 1000 companies.
  • Total downloads reached seven million, with over 300,000 users of the open source products.
  • Talend doubled its staff, increasing to 200 global employees. Continuing this trend, Talend has already hired 15 people in 2010 to support its rapid growth.

now for Jaspersoft numbers

http://www.dbms2.com/2008/09/14/jaspersoft-numbers/

Highlights include:

  • Revenue run rate in the double-digit millions.
  • 40% sequential growth most recent quarter. (I didn’t ask whether there was any reason to suspect seasonality.)
  • 130% annual revenue growth run rate.
  • “Not quite” profitable.
  • Several hundred commercial subscribers, at an average of $25K annually per, including >100 in Europe.
  • 9,000 paying customers of some kind.
  • 100,000+ total deployments, “very conservatively,” counting OEMs as one deployment each and not double-counting for OEMs’ customers. (Nick said Business Objects quotes 45,000 deployments by the same standards.)
  • 70% of revenue from the mid-market, defined as $100 million – $1 billion revenue. 30% from bigger enterprises. (Hmm. That begs a couple of questions, such as where OEM revenue comes in, and whether <$100 million enterprises were truly a negligible part of revenue.)

and for Pentaho numbers-

http://www.dbms2.com/2009/01/27/introduction-to-pentaho/

and http://www.monash.com/uploads/Pentaho-January-2009.pdf

suggests there are far far away from the top 5-6 vendors in BI

and a special mention  for postgreSQL– which is a non Profit but is seriously denting Oracle/MySQL

http://www.postgresql.org/about/

Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 – 1600 depending on column types
Maximum Indexes per Table Unlimited

and leading vendor is EnterpriseDB which is again IBM-partnering as well as IBM funded

http://www.sramanamitra.com/2009/05/18/enterprise-db/

and

http://www.enterprisedb.com/company/news_events/press_releases/2010_21.do

suggest it is still in early stages.

————————————————————–

So what do we conclude-

1) There is a complete lack of transparency in open source BI market shares as almost all these companies are privately held and do not disclose revenues.

2) What may be a pure play open source company may actually be a company funded by a big BI vendor (like Revolution Analytics is funded among others by Intel-Microsoft) and EnterpriseDB has IBM as an investor.MySQL and Sun of course are bought by Oracle

The degree of control by proprietary vendors on open source vendors is still not disclosed- whether they are holding a stake for strategic reasons or otherwise.

3) None of the Open Source Vendors are even close to a 1 Billion dollar revenue number.

Jim Goodnight is pointing out market reality when he says he has not seen much impact (in terms of market share). As for the rest of his remarks, well he’s got a job to do as CEO and thats talk up his company and trash the competition- which he as been doing for 3 decades and unlikely to change now unless there is severe market share impact. Unless you expect him to notice companies less than 5% of his size in revenue.

http://www.cbronline.com/news/sas-ceo-says-cep-open-source-and-cloud-bi-have-limited-appeal

http://bi.cbronline.com/news/pentaho-fires-back-across-sas-bows-over-limited-open-source-appeal

 

So what's new in R 2.12.0

PoissonCDF
Image via Wikipedia

and as per http://cran.r-project.org/src/base/NEWS

the answer is plenty is new in the newR.

While you and me, were busy writing and reading blogs, or generally writing code for earning more money, or our own research- Uncle Peter D and his band of merry men have been really busy in a much more upgraded R.

————————————–

CHANGES————————-

NEW FEATURES:

    • Reading a packages's CITATION file now defaults to ASCII rather
      than Latin-1: a package with a non-ASCII CITATION file should
      declare an encoding in its DESCRIPTION file and use that encoding
      for the CITATION file.

    • difftime() now defaults to the "tzone" attribute of "POSIXlt"
      objects rather than to the current timezone as set by the default
      for the tz argument.  (Wish of PR#14182.)

    • pretty() is now generic, with new methods for "Date" and "POSIXt"
      classes (based on code contributed by Felix Andrews).

    • unique() and match() are now faster on character vectors where
      all elements are in the global CHARSXP cache and have unmarked
      encoding (ASCII).  Thanks to Matthew Dowle for suggesting
      improvements to the way the hash code is generated in unique.c.

    • The enquote() utility, in use internally, is exported now.

    • .C() and .Fortran() now map non-zero return values (other than
      NA_LOGICAL) for logical vectors to TRUE: it has been an implicit
      assumption that they are treated as true.

    • The print() methods for "glm" and "lm" objects now insert
      linebreaks in long calls in the same way that the print() methods
      for "summary.[g]lm" objects have long done.  This does change the
      layout of the examples for a number of packages, e.g. MASS.
      (PR#14250)

    • constrOptim() can now be used with method "SANN".  (PR#14245)

      It gains an argument hessian to be passed to optim(), which
      allows all the ... arguments to be intended for f() and grad().
      (PR#14071)

    • curve() now allows expr to be an object of mode "expression" as
      well as "call" and "function".

    • The "POSIX[cl]t" methods for Axis() have been replaced by a
      single method for "POSIXt".

      There are no longer separate plot() methods for "POSIX[cl]t" and
      "Date": the default method has been able to handle those classes
      for a long time.  This _inter alia_ allows a single date-time
      object to be supplied, the wish of PR#14016.

      The methods had a different default ("") for xlab.

    • Classes "POSIXct", "POSIXlt" and "difftime" have generators
      .POSIXct(), .POSIXlt() and .difftime().  Package authors are
      advised to make use of them (they are available from R 2.11.0) to
      proof against planned future changes to the classes.

      The ordering of the classes has been changed, so "POSIXt" is now
      the second class.  See the document ‘Updating packages for
      changes in R 2.12.x’ on  for
      the consequences for a handful of CRAN packages.

    • The "POSIXct" method of as.Date() allows a timezone to be
      specified (but still defaults to UTC).

    • New list2env() utility function as an inverse of
      as.list() and for fast multi-assign() to existing
      environment.  as.environment() is now generic and uses list2env()
      as list method.

    • There are several small changes to output which ‘zap’ small
      numbers, e.g. in printing quantiles of residuals in summaries
      from "lm" and "glm" fits, and in test statisics in print.anova().

    • Special names such as "dim", "names", etc, are now allowed as
      slot names of S4 classes, with "class" the only remaining
      exception.

    • File .Renviron can have architecture-specific versions such as
      .Renviron.i386 on systems with sub-architectures.

    • installed.packages() has a new argument subarch to filter on
      sub-architecture.

    • The summary() method for packageStatus() now has a separate
      print() method.

    • The default summary() method returns an object inheriting from
      class "summaryDefault" which has a separate print() method that
      calls zapsmall() for numeric/complex values.

    • The startup message now includes the platform and if used,
      sub-architecture: this is useful where different
      (sub-)architectures run on the same OS.

    • The getGraphicsEvent() mechanism now allows multiple windows to
      return graphics events, through the new functions
      setGraphicsEventHandlers(), setGraphicsEventEnv(), and
      getGraphicsEventEnv().  (Currently implemented in the windows()
      and X11() devices.)

    • tools::texi2dvi() gains an index argument, mainly for use by R
      CMD Rd2pdf.

      It avoids the use of texindy by texinfo's texi2dvi >= 1.157,
      since that does not emulate 'makeindex' well enough to avoid
      problems with special characters (such as (, {, !) in indices.

    • The ability of readLines() and scan() to re-encode inputs to
      marked UTF-8 strings on Windows since R 2.7.0 is extended to
      non-UTF-8 locales on other OSes.

    • scan() gains a fileEncoding argument to match read.table().

    • points() and lines() gain "table" methods to match plot().  (Wish
      of PR#10472.)

    • Sys.chmod() allows argument mode to be a vector, recycled along
      paths.

    • There are |, & and xor() methods for classes "octmode" and
      "hexmode", which work bitwise.

    • Environment variables R_DVIPSCMD, R_LATEXCMD, R_MAKEINDEXCMD,
      R_PDFLATEXCMD are no longer used nor set in an R session.  (With
      the move to tools::texi2dvi(), the conventional environment
      variables LATEX, MAKEINDEX and PDFLATEX will be used.
      options("dvipscmd") defaults to the value of DVIPS, then to
      "dvips".)

    • New function isatty() to see if terminal connections are
      redirected.

    • summaryRprof() returns the sampling interval in component
      sample.interval and only returns in by.self data for functions
      with non-zero self times.

    • print(x) and str(x) now indicate if an empty list x is named.

    • install.packages() and remove.packages() with lib unspecified and
      multiple libraries in .libPaths() inform the user of the library
      location used with a message rather than a warning.

    • There is limited support for multiple compressed streams on a
      file: all of [bgx]zfile() allow streams to be appended to an
      existing file, but bzfile() reads only the first stream.

    • Function person() in package utils now uses a given/family scheme
      in preference to first/middle/last, is vectorized to handle an
      arbitrary number of persons, and gains a role argument to specify
      person roles using a controlled vocabulary (the MARC relator
      terms).

    • Package utils adds a new "bibentry" class for representing and
      manipulating bibliographic information in enhanced BibTeX style,
      unifying and enhancing the previously existing mechanisms.

    • A bibstyle() function has been added to the tools package with
      default JSS style for rendering "bibentry" objects, and a
      mechanism for registering other rendering styles.

    • Several aspects of the display of text help are now customizable
      using the new Rd2txt_options() function.
      options("help_text_width") is no longer used.

    • Added \href tag to the Rd format, to allow hyperlinks to URLs
      without displaying the full URL.

    • Added \newcommand and \renewcommand tags to the Rd format, to
      allow user-defined macros.

    • New toRd() generic in the tools package to convert objects to
      fragments of Rd code, and added "fragment" argument to Rd2txt(),
      Rd2HTML(), and Rd2latex() to support it.

    • Directory R_HOME/share/texmf now follows the TDS conventions, so
      can be set as a texmf tree (‘root directory’ in MiKTeX parlance).

    • S3 generic functions now use correct S4 inheritance when
      dispatching on an S4 object.  See ?Methods, section on “Methods
      for S3 Generic Functions” for recommendations and details.

    • format.pval() gains a ... argument to pass arguments such as
      nsmall to format().  (Wish of PR#9574)

    • legend() supports title.adj.  (Wish of PR#13415)

    • Added support for subsetting "raster" objects, plus assigning to
      a subset, conversion to a matrix (of colour strings), and
      comparisons (== and !=).

    • Added a new parseLatex() function (and related functions
      deparseLatex() and latexToUtf8()) to support conversion of
      bibliographic entries for display in R.

    • Text rendering of \itemize in help uses a Unicode bullet in UTF-8
      and most single-byte Windows locales.

    • Added support for polygons with holes to the graphics engine.
      This is implemented for the pdf(), postscript(),
      x11(type="cairo"), windows(), and quartz() devices (and
      associated raster formats), but not for x11(type="Xlib") or
      xfig() or pictex().  The user-level interface is the polypath()
      function in graphics and grid.path() in grid.

    • File NEWS is now generated at installation with a slightly
      different format: it will be in UTF-8 on platforms using UTF-8,
      and otherwise in ASCII.  There is also a PDF version, NEWS.pdf,
      installed at the top-level of the R distribution.

    • kmeans(x, 1) now works.  Further, kmeans now returns between and
      total sum of squares.

    • arrayInd() and which() gain an argument useNames.  For arrayInd,
      the default is now false, for speed reasons.

    • As is done for closures, the default print method for the formula
      class now displays the associated environment if it is not the
      global environment.

    • A new facility has been added for inserting code into a package
      without re-installing it, to facilitate testing changes which can
      be selectively added and backed out.  See ?insertSource.

    • New function readRenviron to (re-)read files in the format of
      ~/.Renviron and Renviron.site.

    • require() will now return FALSE (and not fail) if loading the
      package or one of its dependencies fails.

    • aperm() now allows argument perm to be a character vector when
      the array has named dimnames (as the results of table() calls
      do).  Similarly, array() allows MARGIN to be a character vector.
      (Based on suggestions of Michael Lachmann.)

    • Package utils now exports and documents functions
      aspell_package_Rd_files() and aspell_package_vignettes() for
      spell checking package Rd files and vignettes using Aspell,
      Ispell or Hunspell.

    • Package news can now be given in Rd format, and news() prefers
      these inst/NEWS.Rd files to old-style plain text NEWS or
      inst/NEWS files.

    • New simple function packageVersion().

    • The PCRE library has been updated to version 8.10.

    • The standard Unix-alike terminal interface declares its name to
      readline as 'R', so that can be used for conditional sections in
      ~/.inputrc files.

    • ‘Writing R Extensions’ now stresses that the standard sections in
      .Rd files (other than \alias, \keyword and \note) are intended to
      be unique, and the conversion tools now drop duplicates with a
      warning.

      The .Rd conversion tools also warn about an unrecognized type in
      a \docType section.

    • ecdf() objects now have a quantile() method.

    • format() methods for date-time objects now attempt to make use of
      a "tzone" attribute with "%Z" and "%z" formats, but it is not
      always possible.  (Wish of PR#14358.)

    • tools::texi2dvi(file, clean = TRUE) now works in more cases (e.g.
      where emulation is used and when file is not in the current
      directory).

    • New function droplevels() to remove unused factor levels.

    • system(command, intern = TRUE) now gives an error on a Unix-alike
      (as well as on Windows) if command cannot be run.  It reports a
      non-success exit status from running command as a warning.

      On a Unix-alike an attempt is made to return the actual exit
      status of the command in system(intern = FALSE): previously this
      had been system-dependent but on POSIX-compliant systems the
      value return was 256 times the status.

    • system() has a new argument ignore.stdout which can be used to
      (portably) ignore standard output.

    • system(intern = TRUE) and pipe() connections are guaranteed to be
      avaliable on all builds of R.

    • Sys.which() has been altered to return "" if the command is not
      found (even on Solaris).

    • A facility for defining reference-based S4 classes (in the OOP
      style of Java, C++, etc.) has been added experimentally to
      package methods; see ?ReferenceClasses.

    • The predict method for "loess" fits gains an na.action argument
      which defaults to na.pass rather than the previous default of
      na.omit.

      Predictions from "loess" fits are now named from the row names of
      newdata.

    • Parsing errors detected during Sweave() processing will now be
      reported referencing their original location in the source file.

    • New adjustcolor() utility, e.g., for simple translucent color
      schemes.

    • qr() now has a trivial lm method with a simple (fast) validity
      check.

    • An experimental new programming model has been added to package
      methods for reference (OOP-style) classes and methods.  See
      ?ReferenceClasses.

    • bzip2 has been updated to version 1.0.6 (bug-fix release).
      --with-system-bzlib now requires at least version 1.0.6.

    • R now provides jss.cls and jss.bst (the class and bib style file
      for the Journal of Statistical Software) as well as RJournal.bib
      and Rnews.bib, and R CMD ensures that the .bst and .bib files are
      found by BibTeX.

    • Functions using the TAR environment variable no longer quote the
      value when making system calls.  This allows values such as tar
      --force-local, but does require additional quotes in, e.g., TAR =
      "'/path with spaces/mytar'".

  DEPRECATED & DEFUNCT:

    • Supplying the parser with a character string containing both
      octal/hex and Unicode escapes is now an error.

    • File extension .C for C++ code files in packages is now defunct.

    • R CMD check no longer supports configuration files containing
      Perl configuration variables: use the environment variables
      documented in ‘R Internals’ instead.

    • The save argument of require() now defaults to FALSE and save =
      TRUE is now deprecated.  (This facility is very rarely actually
      used, and was superseded by the Depends field of the DESCRIPTION
      file long ago.)

    • R CMD check --no-latex is deprecated in favour of --no-manual.

    • R CMD Sd2Rd is formally deprecated and will be removed in R
      2.13.0.

  PACKAGE INSTALLATION:

    • install.packages() has a new argument libs_only to optionally
      pass --libs-only to R CMD INSTALL and works analogously for
      Windows binary installs (to add support for 64- or 32-bit
      Windows).

    • When sub-architectures are in use, the installed architectures
      are recorded in the Archs field of the DESCRIPTION file.  There
      is a new default filter, "subarch", in available.packages() to
      make use of this.

      Code is compiled in a copy of the src directory when a package is
      installed for more than one sub-architecture: this avoid problems
      with cleaning the sources between building sub-architectures.

    • R CMD INSTALL --libs-only no longer overrides the setting of
      locking, so a previous version of the package will be restored
      unless --no-lock is specified.

  UTILITIES:

    • R CMD Rprof|build|check are now based on R rather than Perl
      scripts.  The only remaining Perl scripts are the deprecated R
      CMD Sd2Rd and install-info.pl (used only if install-info is not
      found) as well as some maintainer-mode-only scripts.

      *NB:* because these have been completely rewritten, users should
      not expect undocumented details of previous implementations to
      have been duplicated.

      R CMD no longer manipulates the environment variables PERL5LIB
      and PERLLIB.

    • R CMD check has a new argument --extra-arch to confine tests to
      those needed to check an additional sub-architecture.

      Its check for “Subdirectory 'inst' contains no files” is more
      thorough: it looks for files, and warns if there are only empty
      directories.

      Environment variables such as R_LIBS and those used for
      customization can be set for the duration of checking _via_ a
      file ~/.R/check.Renviron (in the format used by .Renviron, and
      with sub-architecture specific versions such as
      ~/.R/check.Renviron.i386 taking precedence).

      There are new options --multiarch to check the package under all
      of the installed sub-architectures and --no-multiarch to confine
      checking to the sub-architecture under which check is invoked.
      If neither option is supplied, a test is done of installed
      sub-architectures and all those which can be run on the current
      OS are used.

      Unless multiple sub-architectures are selected, the install done
      by check for testing purposes is only of the current
      sub-architecture (_via_ R CMD INSTALL --no-multiarch).

      It will skip the check for non-ascii characters in code or data
      if the environment variables _R_CHECK_ASCII_CODE_ or
      _R_CHECK_ASCII_DATA_ are respectively set to FALSE.  (Suggestion
      of Vince Carey.)

    • R CMD build no longer creates an INDEX file (R CMD INSTALL does
      so), and --force removes (rather than overwrites) an existing
      INDEX file.

      It supports a file ~/.R/build.Renviron analogously to check.

      It now runs build-time \Sexpr expressions in help files.

    • R CMD Rd2dvi makes use of tools::texi2dvi() to process the
      package manual.  It is now implemented entirely in R (rather than
      partially as a shell script).

    • R CMD Rprof now uses utils::summaryRprof() rather than Perl.  It
      has new arguments to select one of the tables and to limit the
      number of entries printed.

    • R CMD Sweave now runs R with --vanilla so the environment setting
      of R_LIBS will always be used.

  C-LEVEL FACILITIES:

    • lang5() and lang6() (in addition to pre-existing lang[1-4]())
      convenience functions for easier construction of eval() calls.
      If you have your own definition, do wrap it inside #ifndef lang5
      .... #endif to keep it working with old and new R.

    • Header R.h now includes only the C headers it itself needs, hence
      no longer includes errno.h.  (This helps avoid problems when it
      is included from C++ source files.)

    • Headers Rinternals.h and R_ext/Print.h include the C++ versions
      of stdio.h and stdarg.h respectively if included from a C++
      source file.

  INSTALLATION:

    • A C99 compiler is now required, and more C99 language features
      will be used in the R sources.

    • Tcl/Tk >= 8.4 is now required (increased from 8.3).

    • System functions access, chdir and getcwd are now essential to
      configure R.  (In practice they have been required for some
      time.)

    • make check compares the output of the examples from several of
      the base packages to reference output rather than the previous
      output (if any).  Expect some differences due to differences in
      floating-point computations between platforms.

    • File NEWS is no longer in the sources, but generated as part of
      the installation.  The primary source for changes is now
      doc/NEWS.Rd.

    • The popen system call is now required to build R.  This ensures
      the availability of system(intern = TRUE), pipe() connections and
      printing from postscript().

    • The pkg-config file libR.pc now also works when R is installed
      using a sub-architecture.

    • R has always required a BLAS that conforms to IE60559 arithmetic,
      but after discovery of more real-world problems caused by a BLAS
      that did not, this is tested more thoroughly in this version.

  BUG FIXES:

    • Calls to selectMethod() by default no longer cache inherited
      methods.  This could previously corrupt methods used by as().

    • The densities of non-central chi-squared are now more accurate in
      some cases in the extreme tails, e.g. dchisq(2000, 2, 1000), as a
      series expansion was truncated too early.  (PR#14105)

    • pt() is more accurate in the left tail for ncp large, e.g.
      pt(-1000, 3, 200).  (PR#14069)

    • The default C function (R_binary) for binary ops now sets the S4
      bit in the result if either argument is an S4 object.  (PR#13209)

    • source(echo=TRUE) failed to echo comments that followed the last
      statement in a file.

    • S4 classes that contained one of "matrix", "array" or "ts" and
      also another class now accept superclass objects in new().  Also
      fixes failure to call validObject() for these classes.

    • Conditional inheritance defined by argument test in
      methods::setIs() will no longer be used in S4 method selection
      (caching these methods could give incorrect results).  See
      ?setIs.

    • The signature of an implicit generic is now used by setGeneric()
      when that does not use a definition nor explicitly set a
      signature.

    • A bug in callNextMethod() for some examples with "..." in the
      arguments has been fixed.  See file
      src/library/methods/tests/nextWithDots.R in the sources.

    • match(x, table) (and hence %in%) now treat "POSIXlt" consistently
      with, e.g., "POSIXct".

    • Built-in code dealing with environments (get(), assign(),
      parent.env(), is.environment() and others) now behave
      consistently to recognize S4 subclasses; is.name() also
      recognizes subclasses.

    • The abs.tol control parameter to nlminb() now defaults to 0.0 to
      avoid false declarations of convergence in objective functions
      that may go negative.

    • The standard Unix-alike termination dialog to ask whether to save
      the workspace takes a EOF response as n to avoid problems with a
      damaged terminal connection.  (PR#14332)

    • Added warn.unused argument to hist.default() to allow suppression
      of spurious warnings about graphical parameters used with
      plot=FALSE.  (PR#14341)

    • predict.lm(), summary.lm(), and indeed lm() itself had issues
      with residual DF in zero-weighted cases (the latter two only in
      connection with empty models). (Thanks to Bill Dunlap for
      spotting the predict() case.)

    • aperm() treated resize = NA as resize = TRUE.

    • constrOptim() now has an improved convergence criterion, notably
      for cases where the minimum was (very close to) zero; further,
      other tweaks inspired from code proposals by Ravi Varadhan.

    • Rendering of S3 and S4 methods in man pages has been corrected
      and made consistent across output formats.

    • Simple markup is now allowed in \title sections in .Rd files.

    • The behaviour of as.logical() on factors (to use the levels) was
      lost in R 2.6.0 and has been restored.

    • prompt() did not backquote some default arguments in the \usage
      section.  (Reported by Claudia Beleites.)

    • writeBin() disallows attempts to write 2GB or more in a single
      call. (PR#14362)

    • new() and getClass() will now work if Class is a subclass of
      "classRepresentation" and should also be faster in typical calls.

    • The summary() method for data frames makes a better job of names
      containing characters invalid in the current locale.

    • [[ sub-assignment for factors could create an invalid factor
      (reported by Bill Dunlap).

    • Negate(f) would not evaluate argument f until first use of
      returned function (reported by Olaf Mersmann).

    • quietly=FALSE is now also an optional argument of library(), and
      consequently, quietly is now propagated also for loading
      dependent packages, e.g., in require(*, quietly=TRUE).

    • If the loop variable in a for loop was deleted, it would be
      recreated as a global variable.  (Reported by Radford Neal; the
      fix includes his optimizations as well.)

    • Task callbacks could report the wrong expression when the task
      involved parsing new code. (PR#14368)

    • getNamespaceVersion() failed; this was an accidental change in
      2.11.0. (PR#14374)

    • identical() returned FALSE for external pointer objects even when
      the pointer addresses were the same.

    • L$a@x[] <- val did not duplicate in a case it should have.

    • tempfile() now always gives a random file name (even if the
      directory is specified) when called directly after startup and
      before the R RNG had been used.  (PR#14381)

    • quantile(type=6) behaved inconsistently.  (PR#14383)

    • backSpline(.) behaved incorrectly when the knot sequence was
      decreasing.  (PR#14386)

    • The reference BLAS included in R was assuming that 0*x and x*0
      were always zero (whereas they could be NA or NaN in IEC 60559
      arithmetic).  This was seen in results from tcrossprod, and for
      example that log(0) %*% 0 gave 0.

    • The calculation of whether text was completely outside the device
      region (in which case, you draw nothing) was wrong for screen
      devices (which have [0, 0] at top-left).  The symptom was (long)
      text disappearing when resizing a screen window (to make it
      smaller).  (PR#14391)

    • model.frame(drop.unused.levels = TRUE) did not take into account
      NA values of factors when deciding to drop levels. (PR#14393)

    • library.dynam.unload required an absolute path for libpath.
      (PR#14385)

      Both library() and loadNamespace() now record absolute paths for
      use by searchpaths() and getNamespaceInfo(ns, "path").

    • The self-starting model NLSstClosestX failed if some deviation
      was exactly zero.  (PR#14384)

    • X11(type = "cairo") (and other devices such as png using
      cairographics) and which use Pango font selection now work around
      a bug in Pango when very small fonts (those with sizes between 0
      and 1 in Pango's internal units) are requested.  (PR#14369)

    • Added workaround for the font problem with X11(type = "cairo")
      and similar on Mac OS X whereby italic and bold styles were
      interchanged.  (PR#13463 amongst many other reports.)

    • source(chdir = TRUE) failed to reset the working directory if it
      could not be determined - that is now an error.

    • Fix for crash of example(rasterImage) on x11(type="Xlib").

    • Force Quartz to bring the on-screen display up-to-date
      immediately before the snapshot is taken by grid.cap() in the
      Cocoa implementation. (PR#14260)

    • model.frame had an unstated 500 byte limit on variable names.
      (Example reported by Terry Therneau.)

    • The 256-byte limit on names is now documented.    • Subassignment by [, [[ or $ on an expression object with value
      NULL coerced the object to a list.