April 2011 – Page 5 – DECISION STATS

Predictive Analytics World Conference –New York City and London, UK

Please use the following code to get a 15% discount on the 2 Day Conference Pass: AJAYNY11.

Predictive Analytics World Conference –New York City and London, UK

October 17-21, 2011 – New York City, NY (pawcon.com/nyc)
Nov 30 – Dec 1, 2011 – London, UK (pawcon.com/london)

Predictive Analytics World (pawcon.com) is the business-focused event for predictive analytics
professionals, managers and commercial practitioners, covering today’s commercial deployment of
predictive analytics, across industries and across software vendors. The conference delivers case
studies, expertise, and resources to achieve two objectives:

1) Bigger wins: Strengthen the business impact delivered by predictive analytics

2) Broader capabilities: Establish new opportunities with predictive analytics

Case Studies: How the Leading Enterprises Do It

Predictive Analytics World focuses on concrete examples of deployed predictive analytics. The leading
enterprises have signed up to tell their stories, so you can hear from the horse’s mouth precisely how
Fortune 500 analytics competitors and other top practitioners deploy predictive modeling, and what
kind of business impact it delivers.

PAW NEW YORK CITY 2011

PAW’s NYC program is the richest and most diverse yet, featuring over 40 sessions across three tracks
– including both X and Y tracks, and an “Expert/Practitioner” track — so you can witness how predictive
analytics is applied at major companies.

PAW NYC’s agenda covers hot topics and advanced methods such as ensemble models, social data,
search marketing, crowdsourcing, blackbox trading, fraud detection, risk management, survey analysis,
and other innovative applications that benefit organizations in new and creative ways.

WORKSHOPS: PAW NYC also features five full-day pre- and post-conference workshops that
complement the core conference program. Workshop agendas include advanced predictive modeling
methods, hands-on training, an intro to R (the open source analytics system), and enterprise decision
management.

For more see http://www.predictiveanalyticsworld.com/newyork/2011/

PAW LONDON 2011

PAW London’s agenda covers hot topics and advanced methods such as risk management, uplift
(incremental lift) modeling, open source analytics, and crowdsourcing data mining. Case study
presentations cover campaign targeting, churn modeling, next-best-offer, selecting marketing channels,
global analytics deployment, email marketing, HR candidate search, and other innovative applications
that benefit organizations in new and creative ways.

Join PAW and access the best keynotes, sessions, workshops, exposition, expert panel, live demos,
networking coffee breaks, reception, birds-of-a-feather lunches, brand-name enterprise leaders, and

industry heavyweights in the business.

For more see http://www.predictiveanalyticsworld.com/london

CROSS-INDUSTRY APPLICATIONS

Predictive Analytics World is the only conference of its kind, delivering vendor-neutral sessions across
verticals such as banking, financial services, e-commerce, education, government, healthcare, high
technology, insurance, non-profits, publishing, social gaming, retail and telecommunications

And PAW covers the gamut of commercial applications of predictive analytics, including response
modeling, customer retention with churn modeling, product recommendations, fraud detection, online
marketing optimization, human resource decision-making, law enforcement, sales forecasting, and
credit scoring.

Why bring together such a wide range of endeavors? No matter how you use predictive analytics, the
story is the same: Predicatively scoring customers optimizes business performance. Predictive analytics
initiatives across industries leverage the same core predictive modeling technology, share similar project
overhead and data requirements, and face common process challenges and analytical hurdles.

RAVE REVIEWS:

“Hands down, best applied, analytics conference I have ever attended. Great exposure to cutting-edge
predictive techniques and I was able to turn around and apply some of those learnings to my work
immediately. I’ve never been able to say that after any conference I’ve attended before!”

Jon Francis
Senior Statistician
T-Mobile

Read more: Articles and blog entries about PAW can be found at http://www.predictiveanalyticsworld.com/
pressroom.php

VENDORS. Meet the vendors and learn about their solutions, software and service. Discover the best
predictive analytics vendors available to serve your needs – learn what they do and see how they
compare

COLLEAGUES. Mingle, network and hang out with your best and brightest colleagues. Exchange
experiences over lunch, coffee breaks and the conference reception connecting with those professionals
who face the same challenges as you.

GET STARTED. If you’re new to predictive analytics, kicking off a new initiative, or exploring new ways
to position it at your organization, there’s no better place to get your bearings than Predictive Analytics
World. See what other companies are doing, witness vendor demos, participate in discussions with the
experts, network with your colleagues and weigh your options!

For more information:
http://www.predictiveanalyticsworld.com

View videos of PAW Washington DC, Oct 2010 — now available on-demand:
http://www.predictiveanalyticsworld.com/online-video.php

What is predictive analytics? See the Predictive Analytics Guide:
http://www.predictiveanalyticsworld.com/predictive_analytics.php

If you’d like our informative event updates, sign up at:
http://www.predictiveanalyticsworld.com/signup-us.php

To sign up for the PAW group on LinkedIn, see:
http://www.linkedin.com/e/gis/1005097

For inquiries e-mail regsupport@risingmedia.com or call (717) 798-3495.

Interview David Katz ,Dataspora /David Katz Consulting (decisionstats.com)

Contest for SAS Users and Students

Heres a new contest for SAS users. The prizes are books, so students should be interested as well.

From http://www.sascommunity.org/mwiki/images/b/bc/PointsforprizesRules.pdf

HOW TO ENTER: To qualify for entry, go to the sasCommunity.org web site located at http://www.sascommunity.org/wiki/Main_Page
between April 11, 2011 and May 9, 2011 and either add or edit valid content as described herein to earn award points.
Creation of a first time profile on www.sascommunity.org will earn 1,000 points. For each valid article creation or edit, 100
points will be earned. Articles and subsequent edits should adhere to the sasCommunity.org terms of use as outlined on
http://www.sascommunity.org/wiki/sasCommunity:Terms_of_Use. All points’ accumulation will end at 5:00 PM GMT on
May 9, 2011 and only those points earned between 8:00 AM GMT on April 11, 2011 and 5:00 PM GMT on May 9, 2011
will be counted in this contest. Contest entries made through the Internet will be declared made by the registered user of
the sasCommunity.org profile account. Sponsor is not responsible for phone, technical, network, electronic, computer
hardware or software failures of any kind, misdirected, incomplete, garbled or delayed transmissions. Sponsor will not be
responsible for incorrect or inaccurate entry information, whether caused by entrants or by any of the equipment or
programming associated with or utilized in the contest.
ELIGIBILITY: The contest is open to all sasCommunity.org members 18 year of age or older on the start date of the
contest. Void where prohibited by law. Employees (including immediate family members and/or those living in the same 
household of each), the Sponsor, members of the sasCommunity.org Advisory Board, SAS Global Users Group Executive 
Board, their advertising, promotion and production agencies, the affiliated companies of each, and the immediate family 
members of each are not eligible. 

PRIZE: Three (3) prizes will be awarded based on total points accumulated during the contest as follows:
 1stPlace: 3 SAS®Press books - not to exceed $250 in combined retail value;
 2ndPlace: 2 SAS®Press books - not to exceed $150 in combined retail value; and
 3rdPlace: 1 SAS®Press book - not to exceed $100 in retail value.

What’s New

http://www.sascommunity.org/wiki/Main_Page

New Points for Prizes Contest
Win SAS books!	Contribute content or SAS code to sasCommunity.org for your chance to WIN! To qualify, simply add or edit articles between April 11, 2011 and May 9, 2011 (GMT). Creation of a first-time profile on sasCommunity.org gives you 1,000 points. For each valid article creation or edit, 100 points will be earned. The user with the most points collected during this time wins SAS Press Books! 1st Place: 3 books; 2nd Place: 2 books; 3rd Place: 1 book. Click here for official rules, additional details and ways to contribute. Contributing and gaining points also gets you closer to sasCommunity Guru status as explained in the article below:

Become a sasCommunity Guru
	The sasCommunity support team has been hard at work adding new features and is pleased to announce a points system that recognizes each user’s contributions to the site. Every time you contribute by creating a page, updating it, or just doing a little wiki gardening, you earn points.Earning points is automatic and simple – all you have to do is contribute! Creating your account starts you with 1000 points and all the current users have been credited with points dating back to the site coming online in April 2007. See the User Levels and Points Help Page to learn more.

Greenplum & SAS Pair on ‘Big Analytics’ (java.sys-con.com)
Nordstrom and the SAS Institute (sarahlynnwheeler.wordpress.com)
SAS programming (bravecricket.wordpress.com)
The Popularity of Data Analysis Software (R vs SAS vs SPSS, etc.) (r-bloggers.com)

Augustus- a PMML model producer and consumer. Scoring engine.

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.

Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) Models, Tree Models, Regression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

Identify suitable data with which to construct a new model.
Provide a model schema which proscribes the requirements for the model.
Run the Augustus producer to obtain a new model.
Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

Re-normalize the scoring results or performing an additional transformation.
Supplements the results with global meta-data such as timestamps.
Formatting of the results.
Select certain interesting values from the results.
Restructure the data for use with other applications.

Revolution R, PMML and ADAPA: Webinar April 13 (revolutionanalytics.com)
Predicting R models with PMML: Revolution R Enterprise and ADAPA (revolutionanalytics.com)
In case you missed it: March Roundup (revolutionanalytics.com)

Changes in R software

The newest version of R is now available for download. R 2.13 is ready !!

http://cran.at.r-project.org/bin/windows/base/CHANGES.R-2.13.0.html

Windows-specific changes to R

CHANGES IN R VERSION 2.13.0

WINDOWS VERSION

Windows 2000 is no longer supported. (It went end-of-life in July 2010.)

NEW FEATURES

win_iconv has been updated: this version has a change in the behaviour with BOMs on UTF-16 and UTF-32 files – it removes BOMs when reading and adds them when writing. (This is consistent with Microsoft applications, but Unix versions of iconv usually ignore them.)
Support for repository type win64.binary (used for 64-bit Windows binaries for R 2.11.x only) has been removed.
The installers no longer put an ‘Uninstall’ item on the start menu (to conform to current Microsoft UI guidelines).
Running R always sets the environment variable R_ARCH (as it does on a Unix-alike from the shell-script front-end).
The defaults for options("browser") and options("pdfviewer") are now set from environment variables R_BROWSER and R_PDFVIEWER respectively (as on a Unix-alike). A value of "false" suppresses display (even if there is no false.exe present on the path).
If options("install.lock") is set to TRUE, binary package installs are protected against failure similar to the way source package installs are protected.
file.exists() and unlink() have more support for files > 2GB.
The versions of R.exe in ‘R_HOME/bin/i386,x64/bin’ now support options such as R --vanilla CMD: there is no comparable interface for ‘Rcmd.exe’.
A few more file operations will now work with >2GB files.
The environment variable R_HOME in an R session now uses slash as the path separator (as it always has when set by Rcmd.exe).
Rgui has a new menu item for the PDF ‘Sweave User Manual’.

DEPRECATED

zip.unpack() is deprecated: use unzip().

INSTALLATION

There is support for libjpeg-turbo via setting JPEGDIR to that value in ‘MkRules.local’.
Support for jpeg-6b has been removed.
The sources now work with libpng-1.5.1, jpegsrc.v8c (which are used in the CRAN builds) and tiff-4.0.0beta6 (CRAN builds use 3.9.1). It is possible that they no longer work with older versions than libpng-1.4.5.

BUG FIXES

Workaround for the incorrect values given by Windows’ casinh function on the branch cuts.
Bug fixes for drawing raster objects on windows(). The symptom was the occasional raster image not being drawn, especially when drawing multiple raster images in a single expression. Thanks to Michael Sumner for report and testing.
Printing extremely long string values could overflow the stack and cause the GUI to crash. (PR#14543)

Tonnes of changes!!

http://cran.at.r-project.org/src/base/NEWS

CHANGES IN R VERSION 2.13.0:

  SIGNIFICANT USER-VISIBLE CHANGES:

    â€¢ replicate() (by default) and vapply() (always) now return a
      higher-dimensional array instead of a matrix in the case where
      the inner function value is an array of dimension >= 2.

    â€¢ Printing and formatting of floating point numbers is now using
      the correct number of digits, where it previously rarely differed
      by a few digits. (See â€œscientificâ€ entry below.)  This affects
      _many_ *.Rout.save checks in packages.

  NEW FEATURES:

    â€¢ normalizePath() has been moved to the base package (from utils):
      this is so it can be used by library() and friends.

      It now does tilde expansion.

      It gains new arguments winslash (to select the separator on
      Windows) and mustWork to control the action if a canonical path
      cannot be found.

    â€¢ The previously barely documented limit of 256 bytes on a symbol
      name has been raised to 10,000 bytes (a sanity check).  Long
      symbol names can sometimes occur when deparsing expressions (for
      example, in model.frame).

    â€¢ reformulate() gains a intercept argument.

    â€¢ cmdscale(add = FALSE) now uses the more common definition that
      there is a representation in n-1 or less dimensions, and only
      dimensions corresponding to positive eigenvalues are used.
      (Avoids confusion such as PR#14397.)

    â€¢ Names used by c(), unlist(), cbind() and rbind() are marked with
      an encoding when this can be ascertained.

    â€¢ R colours are now defined to refer to the sRGB color space.

      The PDF, PostScript, and Quartz graphics devices record this
      fact.  X11 (and Cairo) and Windows just assume that your screen
      conforms.

    â€¢ system.file() gains a mustWork argument (suggestion of Bill
      Dunlap).

    â€¢ new.env(hash = TRUE) is now the default.

    â€¢ list2env(envir = NULL) defaults to hashing (with a suitably sized
      environment) for lists of more than 100 elements.

    â€¢ text() gains a formula method.

    â€¢ IQR() now has a type argument which is passed to quantile().

    â€¢ as.vector(), as.double() etc duplicate less when they leave the
      mode unchanged but remove attributes.

      as.vector(mode = "any") no longer duplicates when it does not
      remove attributes.  This helps memory usage in matrix() and
      array().

      matrix() duplicates less if data is an atomic vector with
      attributes such as names (but no class).

      dim(x) <- NULL duplicates less if x has neither dimensions nor
      names (since this operation removes names and dimnames).

    â€¢ setRepositories() gains an addURLs argument.

    â€¢ chisq.test() now also returns a stdres component, for
      standardized residuals (which have unit variance, unlike the
      Pearson residuals).

    â€¢ write.table() and friends gain a fileEncoding argument, to
      simplify writing files for use on other OSes (e.g. a spreadsheet
      intended for Windows or Mac OS X Excel).

    â€¢ Assignment expressions of the form foo::bar(x) <- y and
      foo:::bar(x) <- y now work; the replacement functions used are
      foo::`bar<-` and foo:::`bar<-`.

    â€¢ Sys.getenv() gains a names argument so Sys.getenv(x, names =
      FALSE) can replace the common idiom of as.vector(Sys.getenv()).
      The default has been changed to not name a length-one result.

    â€¢ Lazy loading of environments now preserves attributes and locked
      status. (The locked status of bindings and active bindings are
      still not preserved; this may be addressed in the future).

    â€¢ options("install.lock") may be set to FALSE so that
      install.packages() defaults to --no-lock installs, or (on
      Windows) to TRUE so that binary installs implement locking.

    â€¢ sort(partial = p) for large p now tries Shellsort if quicksort is
      not appropriate and so works for non-numeric atomic vectors.

    â€¢ sapply() gets a new option simplify = "array" which returns a
      â€œhigher rankâ€ array instead of just a matrix when FUN() returns a
      dim() length of two or more.

      replicate() has this option set by default, and vapply() now
      behaves that way internally.

    â€¢ aperm() becomes S3 generic and gets a table method which
      preserves the class.

    â€¢ merge() and as.hclust() methods for objects of class "dendrogram"
      are now provided.

    â€¢ as.POSIXlt.factor() now passes ... to the character method
      (suggestion of Joshua Ulrich).

    â€¢ The character method of as.POSIXlt() now tries to find a format
      that works for all non-NA inputs, not just the first one.

    â€¢ str() now has a method for class "Date" analogous to that for
      class "POSIXt".

    â€¢ New function file.link() to create hard links on those file
      systems (POSIX, NTFS but not FAT) that support them.

    â€¢ New Summary() group method for class "ordered" implements min(),
      max() and range() for ordered factors.

    â€¢ mostattributes<-() now consults the "dim" attribute and not the
      dim() function, making it more useful for objects (such as data
      frames) from classes with methods for dim().  It also uses
      attr<-() in preference to the generics name<-(), dim<-() and
      dimnames<-().  (Related to PR#14469.)

    â€¢ There is a new option "browserNLdisabled" to disable the use of
      an empty (e.g. via the â€˜Returnâ€™ key) as a synonym for c in
      browser() or n under debug().  (Wish of PR#14472.)

    â€¢ example() gains optional new arguments character.only and
      give.lines enabling programmatic exploration.

    â€¢ serialize() and unserialize() are no longer described as
      â€˜experimentalâ€™.  The interface is now regarded as stable,
      although the serialization format may well change in future
      releases.  (serialize() has a new argument version which would
      allow the current format to be written if that happens.)

      New functions saveRDS() and readRDS() are public versions of the
      â€˜internalâ€™ functions .saveRDS() and .readRDS() made available for
      general use.  The dot-name versions remain available as several
      package authors have made use of them, despite the documentation.

      saveRDS() supports compress = "xz".

    â€¢ Many functions when called with a not-open connection will now
      ensure that the connection is left not-open in the event of
      error.  These include read.dcf(), dput(), dump(), load(),
      parse(), readBin(), readChar(), readLines(), save(), writeBin(),
      writeChar(), writeLines(), .readRDS(), .saveRDS() and
      tools::parse_Rd(), as well as functions calling these.

    â€¢ Public functions find.package() and path.package() replace the
      internal dot-name versions.

    â€¢ The default method for terms() now looks for a "terms" attribute
      if it does not find a "terms" component, and so works for model
      frames.

    â€¢ httpd() handlers receive an additional argument containing the
      full request headers as a raw vector (this can be used to parse
      cookies, multi-part forms etc.). The recommended full signature
      for handlers is therefore function(url, query, body, headers,
      ...).

    â€¢ file.edit() gains a fileEncoding argument to specify the encoding
      of the file(s).

    â€¢ The format of the HTML package listings has changed.  If there is
      more than one library tree , a table of links to libraries is
      provided at the top and bottom of the page.  Where a library
      contains more than 100 packages, an alphabetic index is given at
      the top of the section for that library.  (As a consequence,
      package names are now sorted case-insensitively whatever the
      locale.)

    â€¢ isSeekable() now returns FALSE on connections which have
      non-default encoding.  Although documented to record if â€˜in
      principleâ€™ the connection supports seeking, it seems safer to
      report FALSE when it may not work.

    â€¢ R CMD REMOVE and remove.packages() now remove file R.css when
      removing all remaining packages in a library tree.  (Related to
      the wish of PR#14475: note that this file is no longer
      installed.)

    â€¢ unzip() now has a unzip argument like zip.file.extract().  This
      allows an external unzip program to be used, which can be useful
      to access features supported by Info-ZIP's unzip version 6 which
      is now becoming more widely available.

    â€¢ There is a simple zip() function, as wrapper for an external zip
      command.

    â€¢ bzfile() connections can now read from concatenated bzip2 files
      (including files written with bzfile(open = "a")) and files
      created by some other compressors (such as the example of
      PR#14479).

    â€¢ The primitive function c() is now of type BUILTIN.

    â€¢ plot(<dendrogram>, .., nodePar=*) now obeys an optional xpd
      specification (allowing clipping to be turned off completely).

    â€¢ nls(algorithm="port") now shares more code with nlminb(), and is
      more consistent with the other nls() algorithms in its return
      value.

    â€¢ xz has been updated to 5.0.1 (very minor bugfix release).

    â€¢ image() has gained a logical useRaster argument allowing it to
      use a bitmap raster for plotting a regular grid instead of
      polygons. This can be more efficient, but may not be supported by
      all devices. The default is FALSE.

    â€¢ list.files()/dir() gains a new argument include.dirs() to include
      directories in the listing when recursive = TRUE.

    â€¢ New function list.dirs() lists all directories, (even empty
      ones).

    â€¢ file.copy() now (by default) copies read/write/execute
      permissions on files, moderated by the current setting of
      Sys.umask().

    â€¢ Sys.umask() now accepts mode = NA and returns the current umask
      value (visibly) without changing it.

    â€¢ There is a ! method for classes "octmode" and "hexmode": this
      allows xor(a, b) to work if both a and b are from one of those
      classes.

    â€¢ as.raster() no longer fails for vectors or matrices containing
      NAs.

    â€¢ New hook "before.new.plot" allows functions to be run just before
      advancing the frame in plot.new, which is potentially useful for
      custom figure layout implementations.

    â€¢ Package tools has a new function compactPDF() to try to reduce
      the size of PDF files _via_ qpdf or gs.

    â€¢ tar() has a new argument extra_flags.

    â€¢ dotchart() accepts more general objects x such as 1D tables which
      can be coerced by as.numeric() to a numeric vector, with a
      warning since that might not be appropriate.

    â€¢ The previously internal function create.post() is now exported
      from utils, and the documentation for bug.report() and
      help.request() now refer to that for create.post().

      It has a new method = "mailto" on Unix-alikes similar to that on
      Windows: it invokes a default mailer via open (Mac OS X) or
      xdg-open or the default browser (elsewhere).

      The default for ccaddress is now getOption("ccaddress") which is
      by default unset: using the username as a mailing address
      nowadays rarely works as expected.

    â€¢ The default for options("mailer") is now "mailto" on all
      platforms.

    â€¢ unlink() now does tilde-expansion (like most other file
      functions).

    â€¢ file.rename() now allows vector arguments (of the same length).

    â€¢ The "glm" method for logLik() now returns an "nobs" attribute
      (which stats4::BIC() assumed it did).

      The "nls" method for logLik() gave incorrect results for zero
      weights.

    â€¢ There is a new generic function nobs() in package stats, to
      extract from model objects a suitable value for use in BIC
      calculations.  An S4 generic derived from it is defined in
      package stats4.

    â€¢ Code for S4 reference-class methods is now examined for possible
      errors in non-local assignments.

    â€¢ findClasses, getGeneric, findMethods and hasMethods are revised
      to deal consistently with the package= argument and be consistent
      with soft namespace policy for finding objects.

    â€¢ tools::Rdiff() now has the option to return not only the status
      but a character vector of observed differences (which are still
      by default sent to stdout).

    â€¢ The startup environment variables R_ENVIRON_USER, R_ENVIRON,
      R_PROFILE_USER and R_PROFILE are now treated more consistently.
      In all cases an empty value is considered to be set and will stop
      the default being used, and for the last two tilde expansion is
      performed on the file name.  (Note that setting an empty value is
      probably impossible on Windows.)

    â€¢ Using R --no-environ CMD, R --no-site-file CMD or R
      --no-init-file CMD sets environment variables so these settings
      are passed on to child R processes, notably those run by INSTALL,
      check and build. R --vanilla CMD sets these three options (but
      not --no-restore).

    â€¢ smooth.spline() is somewhat faster.  With cv=NA it allows some
      leverage computations to be skipped,

    â€¢ The internal (C) function scientific(), at the heart of R's
      format.info(x), format(x), print(x), etc, for numeric x, has been
      re-written in order to provide slightly more correct results,
      fixing PR#14491, notably in border cases including when digits >=
      16, thanks to substantial contributions (code and experiments)
      from Petr Savicky.  This affects a noticable amount of numeric
      output from R.

    â€¢ A new function grepRaw() has been introduced for finding subsets
      of raw vectors. It supports both literal searches and regular
      expressions.

    â€¢ Package compiler is now provided as a standard package.  See
      ?compiler::compile for information on how to use the compiler.
      This package implements a byte code compiler for R: by default
      the compiler is not used in this release.  See the â€˜R
      Installation and Administration Manualâ€™ for how to compile the
      base and recommended packages.

    â€¢ Providing an exportPattern directive in a NAMESPACE file now
      causes classes to be exported according to the same pattern, for
      example the default from package.skeleton() to specify all names
      starting with a letter.  An explicit directive to
      exportClassPattern will still over-ride.

    â€¢ There is an additional marked encoding "bytes" for character
      strings.  This is intended to be used for non-ASCII strings which
      should be treated as a set of bytes, and never re-encoded as if
      they were in the encoding of the currrent locale: useBytes = TRUE
      is autmatically selected in functions such as writeBin(),
      writeLines(), grep() and strsplit().

      Only a few character operations are supported (such as substr()).

      Printing, format() and cat() will represent non-ASCII bytes in
      such strings by a \xab escape.

    â€¢ The new function removeSource() removes the internally stored
      source from a function.

    â€¢ "srcref" attributes now include two additional line number
      values, recording the line numbers in the order they were parsed.

    â€¢ New functions have been added for source reference access:
      getSrcFilename(), getSrcDirectory(), getSrcLocation() and
      getSrcref().

    â€¢ Sys.chmod() has an extra argument use_umask which defaults to
      true and restricts the file mode by the current setting of umask.
      This means that all the R functions which manipulate
      file/directory permissions by default respect umask, notably R
      CMD INSTALL.

    â€¢ tempfile() has an extra argument fileext to create a temporary
      filename with a specified extension.  (Suggestion and initial
      implementation by Dirk Eddelbuettel.)

      There are improvements in the way Sweave() and Stangle() handle
      non-ASCII vignette sources, especially in a UTF-8 locale: see
      â€˜Writing R Extensionsâ€™ which now has a subsection on this topic.

    â€¢ factanal() now returns the rotation matrix if a rotation such as
      "promax" is used, and hence factor correlations are displayed.
      (Wish of PR#12754.)

    â€¢ The gctorture2() function provides a more refined interface to
      the GC torture process.  Environment variables R_GCTORTURE,
      R_GCTORTURE_WAIT, and R_GCTORTURE_INHIBIT_RELEASE can also be
      used to control the GC torture process.

    â€¢ file.copy(from, to) no longer regards it as an error to supply a
      zero-length from: it now simply does nothing.

    â€¢ rstandard.glm gains a type argument which can be used to request
      standardized Pearson residuals.

    â€¢ A start on a Turkish translation, thanks to Murat Alkan.

    â€¢ .libPaths() calls normalizePath(winslash = "/") on the paths:
      this helps (usually) present them in a user-friendly form and
      should detect duplicate paths accessed via different symbolic
      links.

  SWEAVE CHANGES:

    â€¢ Sweave() has options to produce PNG and JPEG figures, and to use
      a custom function to open a graphics device (see ?RweaveLatex).
      (Based in part on the contribution of PR#14418.)

    â€¢ The default for Sweave() is to produce only PDF figures (rather
      than both EPS and PDF).

    â€¢ Environment variable SWEAVE_OPTIONS can be used to supply
      defaults for existing or new options to be applied after the
      Sweave driver setup has been run.

    â€¢ The Sweave manual is now included as a vignette in the utils
      package.

    â€¢ Sweave() handles keep.source=TRUE much better: it could duplicate
      some lines and omit comments. (Reported by John Maindonald and
      others.)

  C-LEVEL FACILITIES:

    â€¢ Because they use a C99 interface which a C++ compiler is not
      required to support, Rvprintf and REvprintf are only defined by
      R_ext/Print.h in C++ code if the macro R_USE_C99_IN_CXX is
      defined when it is included.

    â€¢ pythag duplicated the C99 function hypot.  It is no longer
      provided, but is used as a substitute for hypot in the very
      unlikely event that the latter is not available.

    â€¢ R_inspect(obj) and R_inspect3(obj, deep, pvec) are (hidden)
      C-level entry points to the internal inspect function and can be
      used for C-level debugging (e.g., in conjunction with the p
      command in gdb).

    â€¢ Compiling R with --enable-strict-barrier now also enables
      additional checking for use of unprotected objects. In
      combination with gctorture() or gctorture2() and a C-level
      debugger this can be useful for tracking down memory protection
      issues.

  UTILITIES:

    â€¢ R CMD Rdiff is now implemented in R on Unix-alikes (as it has
      been on Windows since R 2.12.0).

    â€¢ R CMD build no longer does any cleaning in the supplied package
      directory: all the cleaning is done in the copy.

      It has a new option --install-args to pass arguments to R CMD
      INSTALL for --build (but not when installing to rebuild
      vignettes).

      There is new option, --resave-data, to call
      tools::resaveRdaFiles() on the data directory, to compress
      tabular files (.tab, .csv etc) and to convert .R files to .rda
      files.  The default, --resave-data=gzip, is to do so in a way
      compatible even with years-old versions of R, but better
      compression is given by --resave-data=best, requiring R >=
      2.10.0.

      It now adds a datalist file for data directories of more than
      1Mb.

      Patterns in .Rbuildignore are now also matched against all
      directory names (including those of empty directories).

      There is a new option, --compact-vignettes, to try reducing the
      size of PDF files in the inst/doc directory.  Currently this
      tries qpdf: other options may be used in future.

      When re-building vignettes and a inst/doc/Makefile file is found,
      make clean is run if the makefile has a clean: target.

      After re-building vignettes the default clean-up operation will
      remove any directories (and not just files) created during the
      process: e.g. one package created a .R_cache directory.

      Empty directories are now removed unless the option
      --keep-empty-dirs is given (and a few packages do deliberately
      include empty directories).

      If there is a field BuildVignettes in the package DESCRIPTION
      file with a false value, re-building the vignettes is skipped.

    â€¢ R CMD check now also checks for filenames that are
      case-insensitive matches to Windows' reserved file names with
      extensions, such as nul.Rd, as these have caused problems on some
      Windows systems.

      It checks for inefficiently saved data/*.rda and data/*.RData
      files, and reports on those large than 100Kb.  A more complete
      check (including of the type of compression, but potentially much
      slower) can be switched on by setting environment variable
      _R_CHECK_COMPACT_DATA2_ to TRUE.

      The types of files in the data directory are now checked, as
      packages are _still_ misusing it for non-R data files.

      It now extracts and runs the R code for each vignette in a
      separate directory and R process: this is done in the package's
      declared encoding.  Rather than call tools::checkVignettes(), it
      calls tool::buildVignettes() to see if the vignettes can be
      re-built as they would be by R CMD build.  Option --use-valgrind
      now applies only to these runs, and not when running code to
      rebuild the vignettes.  This version does a much better job of
      suppressing output from successful vignette tests.

      The 00check.log file is a more complete record of what is output
      to stdout: in particular contains more details of the tests.

      It now check all syntactically valid Rd usage entries, and warns
      about assignments (unless these give the usage of replacement
      functions).

      .tar.xz compressed tarballs are now allowed, if tar supports them
      (and setting environment variable TAR to internal ensures so on
      all platforms).

    â€¢ R CMD check now warns if it finds inst/doc/makefile, and R CMD
      build renames such a file to inst/doc/Makefile.

  INSTALLATION:

    â€¢ Installing R no longer tries to find perl, and R CMD no longer
      tries to substitute a full path for awk nor perl - this was a
      legacy from the days when they were used by R itself.  Because a
      couple of packages do use awk, it is set as the make (rather than
      environment) variable AWK.

    â€¢ make check will now fail if there are differences from the
      reference output when testing package examples and if environment
      variable R_STRICT_PACKAGE_CHECK is set to a true value.

    â€¢ The C99 double complex type is now required.

      The C99 complex trigonometric functions (such as csin) are not
      currently required (FreeBSD lacks most of them): substitutes are
      used if they are missing.

    â€¢ The C99 system call va_copy is now required.

    â€¢ If environment variable R_LD_LIBRARY_PATH is set during
      configuration (for example in config.site) it is used unchanged
      in file etc/ldpaths rather than being appended to.

    â€¢ configure looks for support for OpenMP and if found compiles R
      with appropriate flags and also makes them available for use in
      packages: see â€˜Writing R Extensionsâ€™.

      This is currently experimental, and is only used in R with a
      single thread for colSums() and colMeans().  Expect it to be more
      widely used in later versions of R.

      This can be disabled by the --disable-openmp flag.

  PACKAGE INSTALLATION:

    â€¢ R CMD INSTALL --clean now removes copies of a src directory which
      are created when multiple sub-architectures are in use.
      (Following a comment from Berwin Turlach.)

    â€¢ File R.css is now installed on a per-package basis (in the
      package's html directory) rather than in each library tree, and
      this is used for all the HTML pages in the package.  This helps
      when installing packages with static HTML pages for use on a
      webserver.  It will also allow future versions of R to use
      different stylesheets for the packages they install.

    â€¢ A top-level file .Rinstignore in the package sources can list (in
      the same way as .Rbuildignore) files under inst that should not
      be installed.  (Why should there be any such files?  Because all
      the files needed to re-build vignettes need to be under inst/doc,
      but they may not need to be installed.)

    â€¢ R CMD INSTALL has a new option --compact-docs to compact any PDFs
      under the inst/doc directory.  Currently this uses qpdf, which
      must be installed (see â€˜Writing R Extensionsâ€™).

    â€¢ There is a new option --lock which can be used to cancel the
      effect of --no-lock or --pkglock earlier on the command line.

    â€¢ Option --pkglock can now be used with more than one package, and
      is now the default if only one package is specified.

    â€¢ Argument lock of install.packages() can now be use for Mac binary
      installs as well as for Windows ones.  The value "pkglock" is now
      accepted, as well as TRUE and FALSE (the default).

    â€¢ There is a new option --no-clean-on-error for R CMD INSTALL to
      retain a partially installed package for forensic analysis.

    â€¢ Packages with names ending in . are not portable since Windows
      does not work correctly with such directory names.  This is now
      warned about in R CMD check, and will not be allowed in R 2.14.x.

    â€¢ The vignette indices are more comprehensive (in the style of
      browseVignetttes()).

  DEPRECATED & DEFUNCT:

    â€¢ require(save = TRUE) is defunct, and use of the save argument is
      deprecated.

    â€¢ R CMD check --no-latex is defunct: use --no-manual instead.

    â€¢ R CMD Sd2Rd is defunct.

    â€¢ The gamma argument to hsv(), rainbow(), and rgb2hsv() is
      deprecated and no longer has any effect.

    â€¢ The previous options for R CMD build --binary (--auto-zip,
      --use-zip-data and --no-docs) are deprecated (or defunct): use
      the new option --install-args instead.

    â€¢ When a character value is used for the EXPR argument in switch(),
      only a single unnamed alternative value is now allowed.

    â€¢ The wrapper utils::link.html.help() is no longer available.

    â€¢ Zip-ing data sets in packages (and hence R CMD INSTALL options
      --use-zip-data and --auto-zip, as well as the ZipData: yes field
      in a DESCRIPTION file) is defunct.

      Installed packages with zip-ed data sets can still be used, but a
      warning that they should be re-installed will be given.

    â€¢ The â€˜experimentalâ€™ alternative specification of a name space via
      .Export() etc is now defunct.

    â€¢ The option --unsafe to R CMD INSTALL is deprecated: use the
      identical option --no-lock instead.

    â€¢ The entry point pythag in Rmath.h is deprecated in favour of the
      C99 function hypot.  A wrapper for hypot is provided for R 2.13.x
      only.

    â€¢ Direct access to the "source" attribute of functions is
      deprecated; use deparse(fn, control="useSource") to access it,
      and removeSource(fn) to remove it.

    â€¢ R CMD build --binary is now formally deprecated: R CMD INSTALL
      --build has long been the preferred alternative.

    â€¢ Single-character package names are deprecated (and R is already
      disallowed to avoid confusion in Depends: fields).

  BUG FIXES:

    â€¢ drop.terms and the [ method for class "terms" no longer add back
      an intercept.  (Reported by Niels Hansen.)

    â€¢ aggregate preserves the class of a column (e.g. a date) under
      some circumstances where it discarded the class previously.

    â€¢ p.adjust() now always returns a vector result, as documented.  In
      previous versions it copied attributes (such as dimensions) from
      the p argument: now it only copies names.

    â€¢ On PDF and PostScript devices, a line width of zero was recorded
      verbatim and this caused problems for some viewers (a very thin
      line combined with a non-solid line dash pattern could also cause
      a problem).  On these devices, the line width is now limited at
      0.01 and for very thin lines with complex dash patterns the
      device may force the line dash pattern to be solid.  (Reported by
      Jari Oksanen.)

    â€¢ The str() method for class "POSIXt" now gives sensible output for
      0-length input.

    â€¢ The one- and two-argument complex maths functions failed to warn
      if NAs were generated (as their numeric analogues do).

    â€¢ Added .requireCachedGenerics to the dont.mind list for library()
      to avoid warnings about duplicates.

    â€¢ $<-.data.frame messed with the class attribute, breaking any S4
      subclass.  The S4 data.frame class now has its own $<- method,
      and turns dispatch on for this primitive.

    â€¢ Map() did not look up a character argument f in the correct
      frame, thanks to lazy evaluation.  (PR#14495)

    â€¢ file.copy() did not tilde-expand from and to when to was a
      directory.  (PR#14507)

    â€¢ It was possible (but very rare) for the loading test in R CMD
      INSTALL to crash a child R process and so leave around a lock
      directory and a partially installed package.  That test is now
      done in a separate process.

    â€¢ plot(<formula>, data=<matrix>,..) now works in more cases;
      similarly for points(), lines() and text().

    â€¢ edit.default() contained a manual dispatch for matrices (the
      "matrix" class didn't really exist when it was written).  This
      caused an infinite recursion in the no-GUI case and has now been
      removed.

    â€¢ data.frame(check.rows = TRUE) sometimes worked when it should
      have detected an error.  (PR#14530)

    â€¢ scan(sep= , strip.white=TRUE) sometimes stripped trailing spaces
      from within quoted strings.  (The real bug in PR#14522.)

    â€¢ The rank-correlation methods for cor() and cov() with use =
      "complete.obs" computed the ranks before removing missing values,
      whereas the documentation implied incomplete cases were removed
      first.  (PR#14488)

      They also failed for 1-row matrices.

    â€¢ The perpendicular adjustment used in placing text and expressions
      in the margins of plots was not scaled by par("mex"). (Part of
      PR#14532.)

    â€¢ Quartz Cocoa device now catches any Cocoa exceptions that occur
      during the creation of the device window to prevent crashes.  It
      also imposes a limit of 144 ft^2 on the area used by a window to
      catch user errors (unit misinterpretation) early.

    â€¢ The browser (invoked by debug(), browser() or otherwise) would
      display attributes such as "wholeSrcref" that were intended for
      internal use only.

    â€¢ R's internal filename completion now properly handles filenames
      with spaces in them even when the readline library is used.  This
      resolves PR#14452 provided the internal filename completion is
      used (e.g., by setting rc.settings(files = TRUE)).

    â€¢ Inside uniroot(f, ...), -Inf function values are now replaced by
      a maximally *negative* value.

    â€¢ rowsum() could silently over/underflow on integer inputs
      (reported by Bill Dunlap).

    â€¢ as.matrix() did not handle "dist" objects with zero rows.

CHANGES IN R VERSION 2.12.2 patched:

  NEW FEATURES:

    â€¢ max() and min() work harder to ensure that NA has precedence over
      NaN, so e.g. min(NaN, NA) is NA.  (This was not previously
      documented except for within a single numeric vector, where
      compiler optimizations often defeated the code.)

  BUG FIXES:

    â€¢ A change to the C function R_tryEval had broken error messages in
      S4 method selection; the error message is now printed.

    â€¢ PDF output with a non-RGB color model used RGB for the line
      stroke color.  (PR#14511)

    â€¢ stats4::BIC() assumed without checking that an object of class
      "logLik" has an "nobs" attribute: glm() fits did not and so BIC()
      failed for them.

    â€¢ In some circumstances a one-sided mantelhaen.test() reported the
      p-value for the wrong tail.  (PR#14514)

    â€¢ Passing the invalid value lty = NULL to axis() sent an invalid
      value to the graphics device, and might cause the device to
      segfault.

    â€¢ Sweave() with concordance=TRUE could lead to invalid PDF files;
      Sweave.sty has been updated to avoid this.

    â€¢ Non-ASCII characters in the titles of help pages were not
      rendered properly in some locales, and could cause errors or
      warnings.    â€¢ checkRd() gave a spurious error if the \href macro was used.

Oracle launches XBRL extension for financial domains

What is XBRL and how does it work?

http://www.xbrl.org/HowXBRLWorks/

How XBRL Works

XBRL is a member of the family of languages based on XML, or Extensible Markup Language, which is a standard for the electronic exchange of data between businesses and on the internet. Under XML, identifying tags are applied to items of data so that they can be processed efficiently by computer software.

XBRL is a powerful and flexible version of XML which has been defined specifically to meet the requirements of business and financial information. It enables unique identifying tags to be applied to items of financial data, such as ‘net profit’. However, these are more than simple identifiers. They provide a range of information about the item, such as whether it is a monetary item, percentage or fraction. XBRL allows labels in any language to be applied to items, as well as accounting references or other subsidiary information.

XBRL can show how items are related to one another. It can thus represent how they are calculated. It can also identify whether they fall into particular groupings for organisational or presentational purposes. Most importantly, XBRL is easily extensible, so companies and other organisations can adapt it to meet a variety of special requirements.

The rich and powerful structure of XBRL allows very efficient handling of business data by computer software. It supports all the standard tasks involved in compiling, storing and using business data. Such information can be converted into XBRL by suitable mapping processes or generated in XBRL by software. It can then be searched, selected, exchanged or analysed by computer, or published for ordinary viewing.

also see

http://www.xbrl.org/Example1/

and from-

http://www.oracle.com/us/dm/xbrlextension-354972.html?msgid=3-3856862107

With more than 7,000 new U.S. companies facing extensible business reporting language (XBRL) filing mandates in 2011, Oracle has released a free XBRL extension on top of the latest release of Oracle Database.

Oracle’s XBRL extension leverages Oracle Database 11g Release 2 XML to manage the collection, validation, storage, and analysis of XBRL data. It enables organizations to create one or more back-end XBRL repositories based on Oracle Database, providing secure XBRL storage and query-ability with a set of XBRL-specific services.

In addition, the extension integrates easily with Oracle Business Intelligence Suite Enterprise Edition to provide analytics, plus interactive development environments (IDEs) and design tools for creating and editing XBRL taxonomies.

The Other Side of XBRL
“While the XBRL mandate continues to grow, the feedback we keep hearing from the ‘other side’ of XRBL—regulators, academics, financial analysts, and investors—is that they lack sufficient tools and historic data to leverage the full potential of XBRL,” says John O’Rourke, vice president of product marketing, Oracle.

However, O’Rourke says this is quickly changing as XBRL mandates enter their third year—and more and more companies have to comply. While the new extension should be attractive to organizations that produce XBRL filings, O’Rourke expects it will prove particularly valuable to regulators, stock exchanges, universities, and other organizations that need to collect, analyze, and disseminate XBRL-based filings.

Outsourcing, a Bolt-on Solution, or Integrated XBRL Tagging
Until recently, reporting organizations had to choose between expensive third-party outsourcing or manual, in-house tagging with bolt-on solutions— both of which introduce the possibility of error.

In response, Oracle launched Oracle Hyperion Disclosure Management, which provides an XBRL tagging solution that is integrated with the financial close and reporting process for fast and reliable XBRL report submission—without relying on third-party providers. The solution enables organizations to

Author regulatory filings in Microsoft Office and “hot link” them directly to financial reporting systems so they can be easily updated

Graphically perform XBRL tagging at several levels—within Microsoft Office, within EPM system reports, or in the data source metadata

Modify or extend XBRL taxonomies before the mapping process, as well as set up multiple taxonomies

Create and validate final XBRL instance documents before submission

XBRL – The Global Standard for Electronic Reporting (newswire.ca)
Basics of XBRL (cleanclouds.wordpress.com)
How XBRL improves analysis (businessinsider.com)

Divorced Reality

"Perchance some kindlier ark may come wit... — Image via Wikipedia

if I could just shut my eyes tight
escape the world for a while I might
me and my divorced reality
enhanced by enfeebled diminished mortality

must come to terms with this news
having played my cards I accept and lose
surrender asunder to all those events
over whom i have no power to peruse

take a gun and shoot me in the head
watch me twitch until I am dead
perhaps that would gladden those estranged
historical affection now much deranged

melancholy I must reminiscence I will
to sleep perchance one more pop a pill

Divorces rise as taboo falls in urbanizing India (seattletimes.nwsource.com)

The long tail of the internet

On a whim, I took the all time stats of my blog posts (more than 1000 posts) , and tried to plot their distribution.

Basically I copied and pasted all the data in a Google docs spreadsheet. and I created dummy codes (like URL1, URL2…. URL 500)

Next I downloaded the….

I wasnt in the mood for downloading and uploading stuff so I decided to use GGPLOT using Jeroen’s Application at http://www.stat.ucla.edu/~jeroen/

I used the mirror server that Dataspora provides as I have had latency issues with Jeroen’s website.

I got this error while trying to connect the Dataspora App to my Google spreadsheet

The page you have requested cannot be displayed. Another site was requesting access to your Google Account, but sent a malformed request. Please contact the site that you were trying to use when you received this message to inform them of the error. A detailed error message follows:

The site “http://dataspora.com” has not been registered.

Oh dear! Back to Jeroen’s /UCLA’s page.

http://rweb.stat.ucla.edu/ggplot2/

I get this warning but it still manages to log in

This website has not registered with Google to establish a secure connection for authorization requests. We recommend that you continue the process only if you trust the following destination:

http://rweb.stat.ucla.edu/R/googleLogin?domain=rweb.stat.ucla.edu

wow it works! thats cloud computing now so I wonder why Google and Amazon continue to ignore the rApache, and Jeroen’s cloud app . Surely their Google Fusion Tables can be always improved or tweaked. Not to mention the next gen version of R which will have its own server

Pretty cool screenshot (but click to see more)

I get the following pretty graph. Hadley Wickham would be ashamed of me by now.

What went wrong- well one page has 36000 views . Scale is the key to graphical coherence . So I redo- delete home page in Google spreadsheet ,reimport replot. ( I didnt know how to modify data in the cloud app, maybe we need a cloud PlyR) I redo it again as I have a big outlier-The top 10 Statistical GUI article which ironically has only 5 GUIs in that article but hush dont tell to high quality search engine)

So again Belatedly I discover something called layer in ggplot.

Base Graphics engine has really spoilt me to write short functions for plots.

I give up. I rather prefer hist() I go to my favorite GUI Rattle, but it has some dating issues with the dll of GTK+

So I go to John Fox’s simple GUI. R Commander- is the best GUI if you use Occam’s Razor, and I am using Occam’s Chainsaw now.

I get the analysis I want in 12 secs

Summary- GGPLot is more complicated than base graphics engine.

Deducer GUI is not as simple too

R Commander is the best GUI because it retains simplicity

Ignore long tail of internet only at your peril

Almost 2/3 rds of my daily traffic of 400+ comes from old archived content That is why Search Engine Optimization and Alerts for Keywords are CRITICAL for any poor soul trying to write on a blog (which has no journal like prestige nor rewards)

If you make life easier for the search engine, it being a fair chap, rewards you well

Existing web traffic estimates like Comscore and Google Trends ignore this long tail

Comments are welcome (Data is pasted below of 500 rows X 2 columns if you can come up with a better analysis)

Since SAS has ignored web analytics and Google Analytics is hmm hmm, this could be an area of opportunity for R developers as well to create a web analytics package.

Title		Views
Home page		36,185
Top 10 Graphical User Interfaces in Statistical Software		8,264
Matlab-Mathematica-R and GPU Computing		2,166
Wealth = function (numeracy, memory recall)		2,162
The Top Statistical Softwares (GUI)		2,118
About DecisionStats		1,902
Libre Office		1,770
Using Facebook Analytics (Updated)		1,446
Windows Azure vs Amazon EC2 (and Google Storage)		1,386
Interview Hadley Wickham R Project Data Visualization Guru		1,204
Test drive a Chrome notebook.		1,201
Interview Professor John Fox Creator R Commander		1,190
Top ten RRReasons R is bad for you ?		1,178
SAS Institute files first lawsuit against WPS- Episode 1		1,131
R Package Creating		1,104
Interfaces to R		1,039
Using Red R- R with a Visual Interface		950
Google Maps – Jet Ski across Pacific Ocean		922
Norman Nie: R GUI and More		851
Not so AWkward after all: R GUI RKWard		805
Running R on Amazon EC2		786
Startups for Geeks		785
Creating a Blog Aggregator for free		749
Cloud Computing with R		676
Rapid Miner- R Extension		671
Parallel Programming using R in Windows		664
Revolution R for Linux		645
Red R 1.8- Pretty GUI		638
John Sall sets JMP 9 free to tango with R		601
Wordle.net		597
Funny Images from India		571
R is an epic fail or is it just overhyped		568
Great article on Notepad++ and R in R Journal		564
Certifications in Analytics and Business Intelligence		548
R Excel :Updated		542
Enterprise Linux rises rapidly:New Report		537
So which software is the best analytical software? Sigh- It depends		520
Funny Photo :It happens only In India		518
Creating 3D Graphs with Data in R		507
SPSS /PASW Certification – Free until Sept 15		497
Interview :Dr Graham Williams		476
GNU PSPP- The Open Source SPSS		474
Professors and Patches: For a Betterrrr R		467
Running R on Amazon EC2 :Windows		462
WPS response to SAS Lawsuit		458
R language on the GPU		450
KXEN and a Data Mining Survey		449
News on R Commercial Development -Rattle- R Data Mining Tool		449
WPS ( Alternative SAS Language Software) Pricing		447
Kill R? Wait a sec		445
SAS Institute lawsuit against WPS Episode 2 The Clone Wars		442
How to be a BAD blogger?		435
ROC Curve		431
Bulls ,Bears ,Tigers and Asses		424
Trrrouble in land of R…and Open Source Suggestions		422
Interview- BI Dashboards dMINE Sanjay Patel		417
Top Seven Reasons :Why Outsourcing is Bad for India		408
Interviews @Decisionstats		407
Running a R GUI,and parallel programming on Amazon EC2		394
Unbreakable Oracle Linux- and Unshakable-Libre Office-		393
IBM SPSS 19: Marketing Analytics and RFM		387
Analyzing SAS Institute-WPS Lawsuit		377
Hive Tutorial: Cloud Computing		377
R and Hadoop		374
Graphics Presentations		373
Sector/ Sphere – Faster than Hadoop/Mapreduce at Terasort		370
Benchmarking GNU R: DirkE’s view and a Ninja wishlist		363
Webfocus RStat: Pervasive BI using R		363
Open Source Business Intelligence: Pentaho and Jaspersoft		362
How to do Logistic Regression		362
CommeRcial R- Integration in software		359
So what’s new in R 2.12.0		357
Interview Michael J. A. Berry Data Miners, Inc		356
Data Mining through the Android		352
Newer version of Alternative SAS / WPS 2.4 launched		350
How to Analyze Wikileaks Data – R SPARQL		348
JMP 9 releasing on Oct 12		343
The R Online WikiBook		340
Hadley’s tutorials on R Visualization		340
Interview Tasso Argyros CTO Aster Data Systems		339
Parsing XML files easily		337
A Software Called Rattle		335
Which software do we buy? -It depends		329
Jim Goodnight on Open Source- and why he is right -sigh		328
SAS/Blades/Servers/ GPU Benchmarks		326
R Commander Plugins-20 and growing!		324
10 iPhone Apps you can actually use ( and dont have to pay for)		316
R Modeling with huge data		315
The Popularity of Data Analysis Software		315
Interview Donald Farmer Microsoft		307
Learning SAS for free		305
Comparing Base SAS and SPSS		304
Towards better Statistical Interfaces		302
Making NeW R		301
Using Code Snippets in Revolution R		300
R Apache – The next frontier of R Computing		298
Using JMP 9 and R together		297
Doing Time Series using a R GUI		295
Amazon announces Micro Instances for cloud computing		295
Top 5 Free Music Websites		295
Web R- Elastic R and RevoDeploy R		291
R for Stats : Updated		290
Heritage Health Prize- Data Mining Contest for 3mill USD		289
Google AppInventor -Android and Business Intelligence		281
Top R Interviews		278
An Introduction to Data Mining-online book		272
Interview Jim Davis SAS Institute		272
Economic: Indian Caste System -Simplification		271
Rattle Re-Introduced		271
KXEN – Automated Regression Modeling		267
Movie Review- Inglorious Basterds		267
Interview :Doug Savage ,Creator SavageChickens.com		261
IPSUR – A Free R Textbook		258
SAS with the GUI Enterprise Guide (Updated)		256
Trying out Google Prediction API from R		256
Segmenting Models : When and Why		253
Using R and Excel Together		253
R Oracle Data Mining		253
KNIME		253
Using PostgreSQL and MySQL databases in R 2.12 for Windows		250
Fighting Back -The Net, Social Media, Spam, Identity Theft, Terrorism		249
Libre Office (Beta) 3 Launched		248
India to make own DoS -citing cyber security		247
Interview Dominic Pouzin Data Applied		242
R releases new version R 2.9.2		240
SAS to launch SAS/IML with R ( updated)		239
Playing with Playwith- R Package for Interactive Data Visualizations		234
Predictive Analytics World Conference		231
Analytics and BI for small biz		231
Interview Jeanne Harris Co-Author -Analytics at Work and Competing with Analytics		230
Using R for Time Series in SAS		228
General Electric ‘s breach of the spirit and letter of integrity		227
Interview Luis Torgo Author Data Mining with R		222
Browser Based Model Creation		222
Interview James Dixon Pentaho		221
Thoughts on WPS, SAS , R		220
Choosing R for business – What to consider?		220
Buying SAS Institute		219
Google: Prediction API and other cool stuff		218
Interview : R For Stata Users		216
Viva Libre Office		216
Top 10 Games on Linux -sudo update		214
When China overtook India- using DEDUCER		214
KDD 2009 : Demos		211
Interview Dean Abbott Abbott Analytics		210
Statistically Speaking		203
Data Visualization using Tableau		203
SAS and JMP : Visual Data Discovery		203
High Performance Computing and R		200
Troubleshooting Rattle Installation- Data Mining R GUI		194
Google Realtime Live Updates on Egypt Yemen Tunisia Jordan..		192
New Deal in Statistical Training		191
Interview Ken O Connor Business Intelligence Consultant		190
Karmic Koala versus Windows 7		189
Interview Shawn Kung Sr Director Aster Data		189
Pun on Putin		189
Towards better analytical software		188
Dryad- Microsoft’s answer to MR		188
Analyzing Indian – Chinese Relationships		188
LibreOffice News and Google Musings		186
Special Issue of JSS on R GUIs		184
Using Google Docs for Web Scraping		181
Using Reshape2 for transposing datasets in R		180
IBM Buys Netezza		180
Libreoffice 3.3 released		180
Google moving on from MapReduce: rest of world still catching up		179
Linux= Who did what and how much?		176
Interview Carole Jesse Experienced Analytics Professional		176
HIRE ME		175
Test Drive a Google Chrome Notebook: Last Two Days left		174
Q&A with David Smith, Revolution Analytics.		174
R , Ubuntu, RCmdr Updates		173
Interview KNIME Fabian Dill		173
Big Data and R: New Product Release by Revolution Analytics		173
Automated Content Aggregation		173
R or SAS —– R and SAS ?		170
Graphs		169
How to use Oracle for Data Mining		169
Carolina and SAS		166
Interview John Sall Founder JMP/SAS Institute		165
Aster Data hires Quentin Gallivan as CEO		165
Oracle for possible takeover of REvolution Computing		164
The Best and Worst Graphs Ever		163
Statistical Analysis with R- by John M Quick		163
Growing Rapidly: Rapid Miner 4.5		161
SAP and BI on Demand		161
Google Snappy		161
Google Refine		161
Scoring SAS and SPSS Models in the cloud		159
Hey Professor, I am not a Monkey		157
REVolution Computing fails to create a Revolution		156
SAS Lawsuit against WPS- Application Dismissed		156
KDNuggets Poll on SAS: Churn in Analytics Users		154
SAS Early Days		154
Interview James Taylor Decision Management Expert (Updated)		151
Google Books Ngram Viewer		148
Review – R for SAS and SPSS Users		148
New R Journal Edition		146
Here comes PySpread- 85,899,345 rows and 14,316,555 columns		145
Interview Karl Rexer -Rexer Analytics		144
Poem: The Extroverted Engineer		144
Hearst DataMining Challenge		144
This Is It		142
Interview Timo Elliott SAP		141
The Blind Side – Movie Review		141
Data Mining Survey Results :Tools and Offshoring		140
Going Deap : Algols in Python		140
ADVERTISE		139
Interview Jeff Bass, Bass Institute (Part 2)		139
Interview Jim Harris Data Quality Expert OCDQ Blog		139
Do Monkeys Pay for Sex?		138
Privacy Browsing Extensions in Google Chrome		137
China biggest threat to Indian Software in 5 years: Indian Tech CEO		136
Software HIStory: Bass Institute Part 1		135
Grenier’s Theory for Competitiveness		134
Interview Charlie Berger Oracle Data Mining		134
Karmic Koala Ubuntu/Linux 9.2 Preview		133
Analytics and Journals		133
Using Code Editors in R		132
Interview Stephanie McReynolds Director Product Marketing, AsterData		132
Amcharts- Cool Charts Web Editor		130
Mapreduce Book		128
Interesting R competition at Reddit		127
Color of Statistics		127
Amazon goes free for users next month		127
#3443 (deleted)		127
Interview Sarah Blow – Girly Geekdom Founder		126
Social Network Analysis: Using R		126
Interview Thomas C. Redman Author Data Driven		126
Audio Interview Anne Milley , Part 1		124
Advanced Analytics on Multi-Terabyte Datasets- Conferences		123
Geek Humour		123
John M. Chambers Statistical Software Award – 2011		122
My friend -The Computer		120
M2009 Interview Peter Pawlowski AsterData		118
R Journal Dec 2010 and R for Business Analytics		118
Top ten RRReasons R is bad for you ?		116
Interview Michael Zeller,CEO Zementis on PMML		115
Fast R Graphics		114
New Google Ad Planner		114
Making Sense: Hadoop and MapReduce		114
Using SAS/IML with R		114
Facebook App by SAP Crystal Reports		113
Whats behind that pretty SAS Blog?		113
Interview Alison Bolen SAS.com		113
Ajay @ arts		112
My latest creation		112
Indian Crabs – A story		112
Open Source’s worst enemy is itself not Microsoft/SAS/SAP/Oracle		112
Google Cloud Print -print documents from the internet		111
WPS and SAS- A rah-rah comparison		110
Facebook Gmail Killer Threatens to commit Hara Kari live on AOL Techcrunch if unsucessful		110
Open Source and Software Strategy		109
Windows Azure and Amazon Free offer		108
R for Analytics is now live		108
Open Source Compiler for SAS language/ GNU -DAP		107
Using Chromium /Chrome on Ubuntu Linux		107
Interview John Moore CTO, Swimfish		106
Nice BI Tutorials		106
Creating Customized Packages in SAS Software		106
Business Analytics Analyst Relations /Ethics/White Papers		105
Web Crawling Automation		105
The SAS-WPS Lawsuit- Preliminary Hearing		105
Handling time and date in R		105
KXEN Update		104
MapReduce Analytics Apps- AsterData’s Developer Express Plugin		104
+ 1 your website -updated		103
Movie Review- Peepli Live		103
Better Data Visualization in WordPress.com Stats		102
Customizing your R software startup		102
LibreOffice Beta 2 (Office Fork off Oracle) launches!		102
KXEN Case Studies : Financial Sector		102
Deleting Twitter, Facebook,LinkedIn- Accepting Life		102
Google Street View shows Gladiators fighting		101
Carole-Ann’s 2011 Predictions for Decision Management		101
Amazon goes HPC and GPU: Dirk E to revise his R HPC book		101
Happy Thanksgiving Id		101
Interview Phil Rack WPS Consultant and Developer		100
SPSS launches two more PASWs		99
Interview David Smith REvolution Computing		99
Data Mining with R		97
Dataset too big for R ?		97
How Jesus saved my Butt		97
Interview Evan Levy Baseline Consulting		97
The Latest GUI for R- BioR		96
WPS Version 2.5.1 Released – can still run SAS language/data and R		96
SAS legal falls flat against WPS again: Technical Grounds		95
World Programming System:300 pounds for The power of SAS language		94
KNIME and Zementis shake hands		93
Interview Eric Siegel, Phd President Prediction Impact		93
Interview Sarah Burnett BI Analyst,Ovum group		92
Quantifying Analytics ROI		92
PSPP – SPSS ‘s Open Source Counterpart		91
PySpread Magic		91
Interview SPSS Olivier Jouve		91
Interesting Data Visualization:Friendwheels		91
R on Windows HPC Server		90
The declining market for Telecommunication Churn Models		90
Getting Inside R		90
The Big Data Summit Agenda		90
Review: Clash of the Titans		89
Red Hat worth 7.8 Billion now		89
Movie Review : Rajneeti (Politics)		89
3 Idiots: Insight to Indian Engineer Campus Life		89
The Comic Water Games (aka Common Wealth Games)		88
Computer Education grants from Google		88
Challenges of Analyzing a dataset (with R)		87
Input Data in R using the top 3 R GUI		86
Complex Event Processing- SASE Language		85
Interview with Anne Milley, SAS II		85
Data Mining Presentation at M2009 by Dr Vincent Granville		85
Brief Interview Timo Elliott		85
Mapping Health Statistics at CDC.gov		85
Amazon’s Turks Mturk.com		84
Business Intelligence and Stat Computing: The White Man’s Last Stand		84
Movie Review- Dabangg		84
Movie Review: Sherlock Holmes		84
SAS Data Mining 2009 Las Vegas		83
Chinese Fortune Cookies		83
SPSS and R		83
Manjunath- A Batchmate on my mine		82
Data Mining 2010:SAS Conference in Vegas		81
DirkE and JD swoon about Shane’s MOM in Room 106 while writing R code		81
SAS to R Challenge: Unique benchmarking		81
S A S GOOD LIFE UNDER SIEGE – NYT		81
Pentaho and R: working together		81
Interview John F Moore CEO The Lab		80
Ways to use both Windows and Linux together		80
Brief Interview with James G Kobielus		80
For R Writers- Inside R		79
Using Ipod and Iphone with your Ubuntu Laptop		79
Webcasts: Oracle Data Mining		79
The Cloud OS is finally here or is it?: Karmic Koala		79
Movie Review: Lafangey Parinday (Rouge Birds)		79
SAS announcement in education initiatives		78
Using R from within Python		78
Event: Predictive analytics with R, PMML and ADAPA		78
Interesting R and BI Web Event		78
Bruno Aziza, Microsoft Global BI Lead joins PAW Keynote		77
Common Analytical Tasks		77
RWui :Creating R Web Interfaces on the go		77
R Successor Language ‘Tea’ announced		76
Learning SPSS for SAS users		76
Protovis a graphical toolkit for visualization		76
Interview Paul van Eikeren Inference for R		75
Data Visualization: Central Banks		75
Oracle Data Mining 11 G R2		75
Interview Peter J Thomas -Award Winning BI Expert		75
Weak Security in Internet Databases for Statisticians		74
Open Source Cartoon		74
Top Ten Graphs for Business Analytics -Pie Charts (1/10)		74
SAS Sentiment Analysis wins Award		74
JMP Genomics 5 released		74
Short Interview Jill Dyche		73
Interview David Katz ,Dataspora /David Katz Consulting		73
PMML 4.0		73
Ponder This: IBM Research		72
PAW Videos		71
PASW 13 :The preview		71
Cisco SocialMiner		70
Review-The Dark knight		70
MapReduce Patent Granted		70
Cloud Computing and GPU ( and some stats softwares)		70
IBM Business Analytics Forum		70
And now- The Business Analytics Summit		70
Creating an Anonymous Bot		69
R and SAS in Twitter Land		69
Interview:Richard Schultz , CEO REvolution Computing		69
China -United States -The Third Opium War		68
Quick-R and Statmethods.net		68
R Node- and other Web Interfaces to R		68
Life Mojo – A Health Startup		68
Using Views in R and comparing functions across multiple packages		68
Another R Tutorial		67
Interview Karen Lopez Data Modeling Expert		67
QGIS and R		66
Christmas Carol: The Best Software (BI-Stats-Analytics)		66
Software Lawsuits :Ergo		66
STEM is cool		65
Date Night		65
More Advanced SAS Modeling Procs		65
The Big Data Event- Why am I here?		65
Interview Gary Cokins SAS Institute		65
Browser based Music Creation		64
Interview Steve Sarsfield Author The Data Governance Imperative		63
GrapheR		63
Google Web Intelligence (Beta)		61
Data Mining 2009 Interviews- Terry Whitlock, BlueCross BlueShield of TN		60
Audio Interviews -Dr. Colleen McCue National Security Expert		60
Red R- A new beginning		59
YouTube Features: Audio Swap, Mobile posts and Themes		59
R for Predictive Modeling:Workshop		59
KDD2009: Papers Research and Industrial		58
Chapman/Hall announces new series on R		58
Data Visualization and Politics		58
T Shirts Design		58
Jump to JMP: Using Data Analysis in a visual manner		58
Aster Analytics and MapReduce.org		57
OK Cupid Data Visualization- Flow Chart to your Heart		57
R for SAS and SPSS Users		57
Carbon Footprints in the snow		57
Summer School on Uncertainty Quantification		57
High Performance Computing within R: Tutorial		57
Running Stats Softwares on Clouds		57
Amazing Data Visualization- UN Counter Terrorism		56
Cloud MapReduce		56
Statistical Features in WPS		56
An R Package only for SAS Users		56
R is Ready for Business™		55
A Google App for Sales- ERPLY		55
Rexer Analytics Annual Data Miner Survey		55
Cartoons on R		55
American Decline- Why outsourcing doesnt make sense		55
Friday Cartoon Series- New		55
What softwares do you plan to use/learn in the next one year?		54
Great App for Online Sketching		54
September Roundup by Revolution		54
Using Firesheep on Campus, Caltrain and beyond		54
Decisionstats Interview at Big Data Summit, AsterData		53
Learning Hadoop		53
The White Man’s Burden-Poem		53
Curt Monash on Analytics with MapReduce		53
To R or Not to R : Data Mining and CRM for Free		52
Algorithms and Ads: No Free Lunches and Hill Climbing		52
Interview: Roger Haddad, Founder of KXEN Automated Modeling Software		52
Google and Me on Privacy and Openness		52
MapReduce.org		52
Why do bloggers blog ?		52
Live Streaming for Free : UStream		51
Light Cycle of Tron review		51
Lyx Releases 2		51
Interview – Anne Milley, SAS Part 1		51
SAS News		51
KXEN EMEA User Conference 2010-Success in Business Analytics		51
2011 Forecast-ying		51
Kill Analytics		50
Social Media Analysis Toolkit		50
Multi State Models		50
R and Cloud Computing		50
Dataists shake up R community with a rocking contest		50
Interview Anne Milley JMP		49
Movie Review: Between the Folds		49
Jokes in Economics		49
Interview Ajay Ohri Decisionstats.com with DMR		49
One more Y Tube Video		49
Happy Diwali /Google Music		48
SPSS Directions : Rexer Survey Results		48
Redlining in Internet Access and notes on Regression Models		48
Poem : A Poets Life		48
Predictive Analytics World		48
Interview- Phil Rack		48
Building KXEN Models on Ubuntu		48
New Year Resolution Presentation		48
Adobe gulps Omniture		47
SAS Modeling Procs		47
Oracle Open World/ RODM package		47
KDNuggets Survey on R		47
IBM and Revolution team to create new in-database R		47
SAS Institute invests in R project		46
Not just a Cloud		46
New Version of R released: R 2.10.1		46
Review- Iron Man2		46
Online Analytics: Monte Carlo Simulation		45
Predictive Forecasting in Commercial Applications		45
The Race -by D.H Groberg		45
SAS Scoring Accelerators		45
IBM launches Smart Analytics Cloud		45
Reactions to IBM -SPSS takeover.		45
Zementis partners with R Analytics Vendor- Revo		44
A Missing Mandelbrot Who Dun It		44
Downloading your Facebook Photos		44
Android Tutorial		44
The Mommy Track		44
My First You Tube Video: Courtesy the competiton on VOLNIGHT by Univ of Tennessee		44
Born in the USA?		43
Interview Eric A. King President The Modeling Agency		43
Interview Augusto Albeghi (Straycat) —Founder Straysoft		43
Why Cloud?		43
Innovative ways of Calculus: Gifting a comic set for Christmas		43
To find the best chaat or paan shop		43
Google unleashes Fusion Tables		42
Using SAS and C/C++ together		42
Whats new in the latest version of R		42
Bollywood 101		42
Who will forecast for the forecasters?		42
Learning R Easily :Two GUI’s		41
Harvard DropOut Writes Open Letter- His Startup has 350m users		41
BI Software		41
How to read blogs in Indonesian and Chinese!		41
Window to a Blue Cloud: Azure Pricing		41
China bans Chinese Food for Googleplex		41
SAS Program for Students		41
The Year 2010		40
What do you want to know in data analytics?		40
America’s Data Book: Census Abstract 2011		40
Big Data Management and Advanced Analytics		40
AsterData partners with Tableau		40
Using R from other Software		40
SAS on Fraud		40

Cloud Computing May Decrease Your API Call Limit (programmableweb.com)
Book: ggplot2 by Hadley Wickham (r-bloggers.com)
Google Instant Search: What does this mean for advertisers? (wpromote.com)
2 Fun and Useful Goog,e Spreadsheet Tricks (searchenginejournal.com)
R Graphs Resources (decisionstats.com)
The Importance of the Long Tail with Keywords and Phrases (businessbloggingtips.com)
As Google Retools its Search Engine, Content Farms Lose Traction (xconomy.com)

Month: April 2011

Predictive Analytics World Conference –New York City and London, UK

Contest for SAS Users and Students

What’s New

Augustus- a PMML model producer and consumer. Scoring engine.

Recent News

Augustus

PMML

Change Detection using Augustus