Color

once in a blue moon
you may experience a black day
out of the blue
your plans and your blueprint will falter

your blue blood is no longer a guarantee
for continued red carpets and
being born to the purple no cure
to never feeling blue

you may blanche,turn pale, show a yellow liver,
you may be caught in an area so grey,
your face grows red in surprise,
your friends declare you a white elephant,
your friendship terminated without a pink slip,
a black sheep, with the balance sheet in red

wait till it snows
for white Christmas to come
for the white lies to fade
soon spring will be here
greener pastures, still waters beckon
to a world of color not just black and white.

Tips to Play Farmville Really Well

Here are some tips to play Farmville really well-

1) Keep your Farmville friends in a seperate friend list by creating a list at http://www.facebook.com/friends/edit/

This ensures friendship , work and Farmville dont mess around with each other. You also dont need a lot of friends in Farmville (max 40 actives) unlike Mafia Wars

2) Register at http://rewards.zynga.com/ to get free or double rewards by doing the same work. These rewards can be redeemed in-game

3) Set a time as well as money budget. Like $10 per month and 1 hour on weekends with 15 minutes on weekdays with max 3-4 logins. Continue reading “Tips to Play Farmville Really Well”

Changes in R software

The newest version of R is now available for download. R 2.13 is ready !!

http://cran.at.r-project.org/bin/windows/base/CHANGES.R-2.13.0.html

Windows-specific changes to R

CHANGES IN R VERSION 2.13.0

WINDOWS VERSION

Windows 2000 is no longer supported. (It went end-of-life in July 2010.)

NEW FEATURES

win_iconv has been updated: this version has a change in the behaviour with BOMs on UTF-16 and UTF-32 files – it removes BOMs when reading and adds them when writing. (This is consistent with Microsoft applications, but Unix versions of iconv usually ignore them.)
Support for repository type win64.binary (used for 64-bit Windows binaries for R 2.11.x only) has been removed.
The installers no longer put an ‘Uninstall’ item on the start menu (to conform to current Microsoft UI guidelines).
Running R always sets the environment variable R_ARCH (as it does on a Unix-alike from the shell-script front-end).
The defaults for options("browser") and options("pdfviewer") are now set from environment variables R_BROWSER and R_PDFVIEWER respectively (as on a Unix-alike). A value of "false" suppresses display (even if there is no false.exe present on the path).
If options("install.lock") is set to TRUE, binary package installs are protected against failure similar to the way source package installs are protected.
file.exists() and unlink() have more support for files > 2GB.
The versions of R.exe in ‘R_HOME/bin/i386,x64/bin’ now support options such as R --vanilla CMD: there is no comparable interface for ‘Rcmd.exe’.
A few more file operations will now work with >2GB files.
The environment variable R_HOME in an R session now uses slash as the path separator (as it always has when set by Rcmd.exe).
Rgui has a new menu item for the PDF ‘Sweave User Manual’.

DEPRECATED

zip.unpack() is deprecated: use unzip().

INSTALLATION

There is support for libjpeg-turbo via setting JPEGDIR to that value in ‘MkRules.local’.
Support for jpeg-6b has been removed.
The sources now work with libpng-1.5.1, jpegsrc.v8c (which are used in the CRAN builds) and tiff-4.0.0beta6 (CRAN builds use 3.9.1). It is possible that they no longer work with older versions than libpng-1.4.5.

BUG FIXES

Workaround for the incorrect values given by Windows’ casinh function on the branch cuts.
Bug fixes for drawing raster objects on windows(). The symptom was the occasional raster image not being drawn, especially when drawing multiple raster images in a single expression. Thanks to Michael Sumner for report and testing.
Printing extremely long string values could overflow the stack and cause the GUI to crash. (PR#14543)

Tonnes of changes!!

http://cran.at.r-project.org/src/base/NEWS

CHANGES IN R VERSION 2.13.0:

  SIGNIFICANT USER-VISIBLE CHANGES:

    â€¢ replicate() (by default) and vapply() (always) now return a
      higher-dimensional array instead of a matrix in the case where
      the inner function value is an array of dimension >= 2.

    â€¢ Printing and formatting of floating point numbers is now using
      the correct number of digits, where it previously rarely differed
      by a few digits. (See â€œscientificâ€ entry below.)  This affects
      _many_ *.Rout.save checks in packages.

  NEW FEATURES:

    â€¢ normalizePath() has been moved to the base package (from utils):
      this is so it can be used by library() and friends.

      It now does tilde expansion.

      It gains new arguments winslash (to select the separator on
      Windows) and mustWork to control the action if a canonical path
      cannot be found.

    â€¢ The previously barely documented limit of 256 bytes on a symbol
      name has been raised to 10,000 bytes (a sanity check).  Long
      symbol names can sometimes occur when deparsing expressions (for
      example, in model.frame).

    â€¢ reformulate() gains a intercept argument.

    â€¢ cmdscale(add = FALSE) now uses the more common definition that
      there is a representation in n-1 or less dimensions, and only
      dimensions corresponding to positive eigenvalues are used.
      (Avoids confusion such as PR#14397.)

    â€¢ Names used by c(), unlist(), cbind() and rbind() are marked with
      an encoding when this can be ascertained.

    â€¢ R colours are now defined to refer to the sRGB color space.

      The PDF, PostScript, and Quartz graphics devices record this
      fact.  X11 (and Cairo) and Windows just assume that your screen
      conforms.

    â€¢ system.file() gains a mustWork argument (suggestion of Bill
      Dunlap).

    â€¢ new.env(hash = TRUE) is now the default.

    â€¢ list2env(envir = NULL) defaults to hashing (with a suitably sized
      environment) for lists of more than 100 elements.

    â€¢ text() gains a formula method.

    â€¢ IQR() now has a type argument which is passed to quantile().

    â€¢ as.vector(), as.double() etc duplicate less when they leave the
      mode unchanged but remove attributes.

      as.vector(mode = "any") no longer duplicates when it does not
      remove attributes.  This helps memory usage in matrix() and
      array().

      matrix() duplicates less if data is an atomic vector with
      attributes such as names (but no class).

      dim(x) <- NULL duplicates less if x has neither dimensions nor
      names (since this operation removes names and dimnames).

    â€¢ setRepositories() gains an addURLs argument.

    â€¢ chisq.test() now also returns a stdres component, for
      standardized residuals (which have unit variance, unlike the
      Pearson residuals).

    â€¢ write.table() and friends gain a fileEncoding argument, to
      simplify writing files for use on other OSes (e.g. a spreadsheet
      intended for Windows or Mac OS X Excel).

    â€¢ Assignment expressions of the form foo::bar(x) <- y and
      foo:::bar(x) <- y now work; the replacement functions used are
      foo::`bar<-` and foo:::`bar<-`.

    â€¢ Sys.getenv() gains a names argument so Sys.getenv(x, names =
      FALSE) can replace the common idiom of as.vector(Sys.getenv()).
      The default has been changed to not name a length-one result.

    â€¢ Lazy loading of environments now preserves attributes and locked
      status. (The locked status of bindings and active bindings are
      still not preserved; this may be addressed in the future).

    â€¢ options("install.lock") may be set to FALSE so that
      install.packages() defaults to --no-lock installs, or (on
      Windows) to TRUE so that binary installs implement locking.

    â€¢ sort(partial = p) for large p now tries Shellsort if quicksort is
      not appropriate and so works for non-numeric atomic vectors.

    â€¢ sapply() gets a new option simplify = "array" which returns a
      â€œhigher rankâ€ array instead of just a matrix when FUN() returns a
      dim() length of two or more.

      replicate() has this option set by default, and vapply() now
      behaves that way internally.

    â€¢ aperm() becomes S3 generic and gets a table method which
      preserves the class.

    â€¢ merge() and as.hclust() methods for objects of class "dendrogram"
      are now provided.

    â€¢ as.POSIXlt.factor() now passes ... to the character method
      (suggestion of Joshua Ulrich).

    â€¢ The character method of as.POSIXlt() now tries to find a format
      that works for all non-NA inputs, not just the first one.

    â€¢ str() now has a method for class "Date" analogous to that for
      class "POSIXt".

    â€¢ New function file.link() to create hard links on those file
      systems (POSIX, NTFS but not FAT) that support them.

    â€¢ New Summary() group method for class "ordered" implements min(),
      max() and range() for ordered factors.

    â€¢ mostattributes<-() now consults the "dim" attribute and not the
      dim() function, making it more useful for objects (such as data
      frames) from classes with methods for dim().  It also uses
      attr<-() in preference to the generics name<-(), dim<-() and
      dimnames<-().  (Related to PR#14469.)

    â€¢ There is a new option "browserNLdisabled" to disable the use of
      an empty (e.g. via the â€˜Returnâ€™ key) as a synonym for c in
      browser() or n under debug().  (Wish of PR#14472.)

    â€¢ example() gains optional new arguments character.only and
      give.lines enabling programmatic exploration.

    â€¢ serialize() and unserialize() are no longer described as
      â€˜experimentalâ€™.  The interface is now regarded as stable,
      although the serialization format may well change in future
      releases.  (serialize() has a new argument version which would
      allow the current format to be written if that happens.)

      New functions saveRDS() and readRDS() are public versions of the
      â€˜internalâ€™ functions .saveRDS() and .readRDS() made available for
      general use.  The dot-name versions remain available as several
      package authors have made use of them, despite the documentation.

      saveRDS() supports compress = "xz".

    â€¢ Many functions when called with a not-open connection will now
      ensure that the connection is left not-open in the event of
      error.  These include read.dcf(), dput(), dump(), load(),
      parse(), readBin(), readChar(), readLines(), save(), writeBin(),
      writeChar(), writeLines(), .readRDS(), .saveRDS() and
      tools::parse_Rd(), as well as functions calling these.

    â€¢ Public functions find.package() and path.package() replace the
      internal dot-name versions.

    â€¢ The default method for terms() now looks for a "terms" attribute
      if it does not find a "terms" component, and so works for model
      frames.

    â€¢ httpd() handlers receive an additional argument containing the
      full request headers as a raw vector (this can be used to parse
      cookies, multi-part forms etc.). The recommended full signature
      for handlers is therefore function(url, query, body, headers,
      ...).

    â€¢ file.edit() gains a fileEncoding argument to specify the encoding
      of the file(s).

    â€¢ The format of the HTML package listings has changed.  If there is
      more than one library tree , a table of links to libraries is
      provided at the top and bottom of the page.  Where a library
      contains more than 100 packages, an alphabetic index is given at
      the top of the section for that library.  (As a consequence,
      package names are now sorted case-insensitively whatever the
      locale.)

    â€¢ isSeekable() now returns FALSE on connections which have
      non-default encoding.  Although documented to record if â€˜in
      principleâ€™ the connection supports seeking, it seems safer to
      report FALSE when it may not work.

    â€¢ R CMD REMOVE and remove.packages() now remove file R.css when
      removing all remaining packages in a library tree.  (Related to
      the wish of PR#14475: note that this file is no longer
      installed.)

    â€¢ unzip() now has a unzip argument like zip.file.extract().  This
      allows an external unzip program to be used, which can be useful
      to access features supported by Info-ZIP's unzip version 6 which
      is now becoming more widely available.

    â€¢ There is a simple zip() function, as wrapper for an external zip
      command.

    â€¢ bzfile() connections can now read from concatenated bzip2 files
      (including files written with bzfile(open = "a")) and files
      created by some other compressors (such as the example of
      PR#14479).

    â€¢ The primitive function c() is now of type BUILTIN.

    â€¢ plot(<dendrogram>, .., nodePar=*) now obeys an optional xpd
      specification (allowing clipping to be turned off completely).

    â€¢ nls(algorithm="port") now shares more code with nlminb(), and is
      more consistent with the other nls() algorithms in its return
      value.

    â€¢ xz has been updated to 5.0.1 (very minor bugfix release).

    â€¢ image() has gained a logical useRaster argument allowing it to
      use a bitmap raster for plotting a regular grid instead of
      polygons. This can be more efficient, but may not be supported by
      all devices. The default is FALSE.

    â€¢ list.files()/dir() gains a new argument include.dirs() to include
      directories in the listing when recursive = TRUE.

    â€¢ New function list.dirs() lists all directories, (even empty
      ones).

    â€¢ file.copy() now (by default) copies read/write/execute
      permissions on files, moderated by the current setting of
      Sys.umask().

    â€¢ Sys.umask() now accepts mode = NA and returns the current umask
      value (visibly) without changing it.

    â€¢ There is a ! method for classes "octmode" and "hexmode": this
      allows xor(a, b) to work if both a and b are from one of those
      classes.

    â€¢ as.raster() no longer fails for vectors or matrices containing
      NAs.

    â€¢ New hook "before.new.plot" allows functions to be run just before
      advancing the frame in plot.new, which is potentially useful for
      custom figure layout implementations.

    â€¢ Package tools has a new function compactPDF() to try to reduce
      the size of PDF files _via_ qpdf or gs.

    â€¢ tar() has a new argument extra_flags.

    â€¢ dotchart() accepts more general objects x such as 1D tables which
      can be coerced by as.numeric() to a numeric vector, with a
      warning since that might not be appropriate.

    â€¢ The previously internal function create.post() is now exported
      from utils, and the documentation for bug.report() and
      help.request() now refer to that for create.post().

      It has a new method = "mailto" on Unix-alikes similar to that on
      Windows: it invokes a default mailer via open (Mac OS X) or
      xdg-open or the default browser (elsewhere).

      The default for ccaddress is now getOption("ccaddress") which is
      by default unset: using the username as a mailing address
      nowadays rarely works as expected.

    â€¢ The default for options("mailer") is now "mailto" on all
      platforms.

    â€¢ unlink() now does tilde-expansion (like most other file
      functions).

    â€¢ file.rename() now allows vector arguments (of the same length).

    â€¢ The "glm" method for logLik() now returns an "nobs" attribute
      (which stats4::BIC() assumed it did).

      The "nls" method for logLik() gave incorrect results for zero
      weights.

    â€¢ There is a new generic function nobs() in package stats, to
      extract from model objects a suitable value for use in BIC
      calculations.  An S4 generic derived from it is defined in
      package stats4.

    â€¢ Code for S4 reference-class methods is now examined for possible
      errors in non-local assignments.

    â€¢ findClasses, getGeneric, findMethods and hasMethods are revised
      to deal consistently with the package= argument and be consistent
      with soft namespace policy for finding objects.

    â€¢ tools::Rdiff() now has the option to return not only the status
      but a character vector of observed differences (which are still
      by default sent to stdout).

    â€¢ The startup environment variables R_ENVIRON_USER, R_ENVIRON,
      R_PROFILE_USER and R_PROFILE are now treated more consistently.
      In all cases an empty value is considered to be set and will stop
      the default being used, and for the last two tilde expansion is
      performed on the file name.  (Note that setting an empty value is
      probably impossible on Windows.)

    â€¢ Using R --no-environ CMD, R --no-site-file CMD or R
      --no-init-file CMD sets environment variables so these settings
      are passed on to child R processes, notably those run by INSTALL,
      check and build. R --vanilla CMD sets these three options (but
      not --no-restore).

    â€¢ smooth.spline() is somewhat faster.  With cv=NA it allows some
      leverage computations to be skipped,

    â€¢ The internal (C) function scientific(), at the heart of R's
      format.info(x), format(x), print(x), etc, for numeric x, has been
      re-written in order to provide slightly more correct results,
      fixing PR#14491, notably in border cases including when digits >=
      16, thanks to substantial contributions (code and experiments)
      from Petr Savicky.  This affects a noticable amount of numeric
      output from R.

    â€¢ A new function grepRaw() has been introduced for finding subsets
      of raw vectors. It supports both literal searches and regular
      expressions.

    â€¢ Package compiler is now provided as a standard package.  See
      ?compiler::compile for information on how to use the compiler.
      This package implements a byte code compiler for R: by default
      the compiler is not used in this release.  See the â€˜R
      Installation and Administration Manualâ€™ for how to compile the
      base and recommended packages.

    â€¢ Providing an exportPattern directive in a NAMESPACE file now
      causes classes to be exported according to the same pattern, for
      example the default from package.skeleton() to specify all names
      starting with a letter.  An explicit directive to
      exportClassPattern will still over-ride.

    â€¢ There is an additional marked encoding "bytes" for character
      strings.  This is intended to be used for non-ASCII strings which
      should be treated as a set of bytes, and never re-encoded as if
      they were in the encoding of the currrent locale: useBytes = TRUE
      is autmatically selected in functions such as writeBin(),
      writeLines(), grep() and strsplit().

      Only a few character operations are supported (such as substr()).

      Printing, format() and cat() will represent non-ASCII bytes in
      such strings by a \xab escape.

    â€¢ The new function removeSource() removes the internally stored
      source from a function.

    â€¢ "srcref" attributes now include two additional line number
      values, recording the line numbers in the order they were parsed.

    â€¢ New functions have been added for source reference access:
      getSrcFilename(), getSrcDirectory(), getSrcLocation() and
      getSrcref().

    â€¢ Sys.chmod() has an extra argument use_umask which defaults to
      true and restricts the file mode by the current setting of umask.
      This means that all the R functions which manipulate
      file/directory permissions by default respect umask, notably R
      CMD INSTALL.

    â€¢ tempfile() has an extra argument fileext to create a temporary
      filename with a specified extension.  (Suggestion and initial
      implementation by Dirk Eddelbuettel.)

      There are improvements in the way Sweave() and Stangle() handle
      non-ASCII vignette sources, especially in a UTF-8 locale: see
      â€˜Writing R Extensionsâ€™ which now has a subsection on this topic.

    â€¢ factanal() now returns the rotation matrix if a rotation such as
      "promax" is used, and hence factor correlations are displayed.
      (Wish of PR#12754.)

    â€¢ The gctorture2() function provides a more refined interface to
      the GC torture process.  Environment variables R_GCTORTURE,
      R_GCTORTURE_WAIT, and R_GCTORTURE_INHIBIT_RELEASE can also be
      used to control the GC torture process.

    â€¢ file.copy(from, to) no longer regards it as an error to supply a
      zero-length from: it now simply does nothing.

    â€¢ rstandard.glm gains a type argument which can be used to request
      standardized Pearson residuals.

    â€¢ A start on a Turkish translation, thanks to Murat Alkan.

    â€¢ .libPaths() calls normalizePath(winslash = "/") on the paths:
      this helps (usually) present them in a user-friendly form and
      should detect duplicate paths accessed via different symbolic
      links.

  SWEAVE CHANGES:

    â€¢ Sweave() has options to produce PNG and JPEG figures, and to use
      a custom function to open a graphics device (see ?RweaveLatex).
      (Based in part on the contribution of PR#14418.)

    â€¢ The default for Sweave() is to produce only PDF figures (rather
      than both EPS and PDF).

    â€¢ Environment variable SWEAVE_OPTIONS can be used to supply
      defaults for existing or new options to be applied after the
      Sweave driver setup has been run.

    â€¢ The Sweave manual is now included as a vignette in the utils
      package.

    â€¢ Sweave() handles keep.source=TRUE much better: it could duplicate
      some lines and omit comments. (Reported by John Maindonald and
      others.)

  C-LEVEL FACILITIES:

    â€¢ Because they use a C99 interface which a C++ compiler is not
      required to support, Rvprintf and REvprintf are only defined by
      R_ext/Print.h in C++ code if the macro R_USE_C99_IN_CXX is
      defined when it is included.

    â€¢ pythag duplicated the C99 function hypot.  It is no longer
      provided, but is used as a substitute for hypot in the very
      unlikely event that the latter is not available.

    â€¢ R_inspect(obj) and R_inspect3(obj, deep, pvec) are (hidden)
      C-level entry points to the internal inspect function and can be
      used for C-level debugging (e.g., in conjunction with the p
      command in gdb).

    â€¢ Compiling R with --enable-strict-barrier now also enables
      additional checking for use of unprotected objects. In
      combination with gctorture() or gctorture2() and a C-level
      debugger this can be useful for tracking down memory protection
      issues.

  UTILITIES:

    â€¢ R CMD Rdiff is now implemented in R on Unix-alikes (as it has
      been on Windows since R 2.12.0).

    â€¢ R CMD build no longer does any cleaning in the supplied package
      directory: all the cleaning is done in the copy.

      It has a new option --install-args to pass arguments to R CMD
      INSTALL for --build (but not when installing to rebuild
      vignettes).

      There is new option, --resave-data, to call
      tools::resaveRdaFiles() on the data directory, to compress
      tabular files (.tab, .csv etc) and to convert .R files to .rda
      files.  The default, --resave-data=gzip, is to do so in a way
      compatible even with years-old versions of R, but better
      compression is given by --resave-data=best, requiring R >=
      2.10.0.

      It now adds a datalist file for data directories of more than
      1Mb.

      Patterns in .Rbuildignore are now also matched against all
      directory names (including those of empty directories).

      There is a new option, --compact-vignettes, to try reducing the
      size of PDF files in the inst/doc directory.  Currently this
      tries qpdf: other options may be used in future.

      When re-building vignettes and a inst/doc/Makefile file is found,
      make clean is run if the makefile has a clean: target.

      After re-building vignettes the default clean-up operation will
      remove any directories (and not just files) created during the
      process: e.g. one package created a .R_cache directory.

      Empty directories are now removed unless the option
      --keep-empty-dirs is given (and a few packages do deliberately
      include empty directories).

      If there is a field BuildVignettes in the package DESCRIPTION
      file with a false value, re-building the vignettes is skipped.

    â€¢ R CMD check now also checks for filenames that are
      case-insensitive matches to Windows' reserved file names with
      extensions, such as nul.Rd, as these have caused problems on some
      Windows systems.

      It checks for inefficiently saved data/*.rda and data/*.RData
      files, and reports on those large than 100Kb.  A more complete
      check (including of the type of compression, but potentially much
      slower) can be switched on by setting environment variable
      _R_CHECK_COMPACT_DATA2_ to TRUE.

      The types of files in the data directory are now checked, as
      packages are _still_ misusing it for non-R data files.

      It now extracts and runs the R code for each vignette in a
      separate directory and R process: this is done in the package's
      declared encoding.  Rather than call tools::checkVignettes(), it
      calls tool::buildVignettes() to see if the vignettes can be
      re-built as they would be by R CMD build.  Option --use-valgrind
      now applies only to these runs, and not when running code to
      rebuild the vignettes.  This version does a much better job of
      suppressing output from successful vignette tests.

      The 00check.log file is a more complete record of what is output
      to stdout: in particular contains more details of the tests.

      It now check all syntactically valid Rd usage entries, and warns
      about assignments (unless these give the usage of replacement
      functions).

      .tar.xz compressed tarballs are now allowed, if tar supports them
      (and setting environment variable TAR to internal ensures so on
      all platforms).

    â€¢ R CMD check now warns if it finds inst/doc/makefile, and R CMD
      build renames such a file to inst/doc/Makefile.

  INSTALLATION:

    â€¢ Installing R no longer tries to find perl, and R CMD no longer
      tries to substitute a full path for awk nor perl - this was a
      legacy from the days when they were used by R itself.  Because a
      couple of packages do use awk, it is set as the make (rather than
      environment) variable AWK.

    â€¢ make check will now fail if there are differences from the
      reference output when testing package examples and if environment
      variable R_STRICT_PACKAGE_CHECK is set to a true value.

    â€¢ The C99 double complex type is now required.

      The C99 complex trigonometric functions (such as csin) are not
      currently required (FreeBSD lacks most of them): substitutes are
      used if they are missing.

    â€¢ The C99 system call va_copy is now required.

    â€¢ If environment variable R_LD_LIBRARY_PATH is set during
      configuration (for example in config.site) it is used unchanged
      in file etc/ldpaths rather than being appended to.

    â€¢ configure looks for support for OpenMP and if found compiles R
      with appropriate flags and also makes them available for use in
      packages: see â€˜Writing R Extensionsâ€™.

      This is currently experimental, and is only used in R with a
      single thread for colSums() and colMeans().  Expect it to be more
      widely used in later versions of R.

      This can be disabled by the --disable-openmp flag.

  PACKAGE INSTALLATION:

    â€¢ R CMD INSTALL --clean now removes copies of a src directory which
      are created when multiple sub-architectures are in use.
      (Following a comment from Berwin Turlach.)

    â€¢ File R.css is now installed on a per-package basis (in the
      package's html directory) rather than in each library tree, and
      this is used for all the HTML pages in the package.  This helps
      when installing packages with static HTML pages for use on a
      webserver.  It will also allow future versions of R to use
      different stylesheets for the packages they install.

    â€¢ A top-level file .Rinstignore in the package sources can list (in
      the same way as .Rbuildignore) files under inst that should not
      be installed.  (Why should there be any such files?  Because all
      the files needed to re-build vignettes need to be under inst/doc,
      but they may not need to be installed.)

    â€¢ R CMD INSTALL has a new option --compact-docs to compact any PDFs
      under the inst/doc directory.  Currently this uses qpdf, which
      must be installed (see â€˜Writing R Extensionsâ€™).

    â€¢ There is a new option --lock which can be used to cancel the
      effect of --no-lock or --pkglock earlier on the command line.

    â€¢ Option --pkglock can now be used with more than one package, and
      is now the default if only one package is specified.

    â€¢ Argument lock of install.packages() can now be use for Mac binary
      installs as well as for Windows ones.  The value "pkglock" is now
      accepted, as well as TRUE and FALSE (the default).

    â€¢ There is a new option --no-clean-on-error for R CMD INSTALL to
      retain a partially installed package for forensic analysis.

    â€¢ Packages with names ending in . are not portable since Windows
      does not work correctly with such directory names.  This is now
      warned about in R CMD check, and will not be allowed in R 2.14.x.

    â€¢ The vignette indices are more comprehensive (in the style of
      browseVignetttes()).

  DEPRECATED & DEFUNCT:

    â€¢ require(save = TRUE) is defunct, and use of the save argument is
      deprecated.

    â€¢ R CMD check --no-latex is defunct: use --no-manual instead.

    â€¢ R CMD Sd2Rd is defunct.

    â€¢ The gamma argument to hsv(), rainbow(), and rgb2hsv() is
      deprecated and no longer has any effect.

    â€¢ The previous options for R CMD build --binary (--auto-zip,
      --use-zip-data and --no-docs) are deprecated (or defunct): use
      the new option --install-args instead.

    â€¢ When a character value is used for the EXPR argument in switch(),
      only a single unnamed alternative value is now allowed.

    â€¢ The wrapper utils::link.html.help() is no longer available.

    â€¢ Zip-ing data sets in packages (and hence R CMD INSTALL options
      --use-zip-data and --auto-zip, as well as the ZipData: yes field
      in a DESCRIPTION file) is defunct.

      Installed packages with zip-ed data sets can still be used, but a
      warning that they should be re-installed will be given.

    â€¢ The â€˜experimentalâ€™ alternative specification of a name space via
      .Export() etc is now defunct.

    â€¢ The option --unsafe to R CMD INSTALL is deprecated: use the
      identical option --no-lock instead.

    â€¢ The entry point pythag in Rmath.h is deprecated in favour of the
      C99 function hypot.  A wrapper for hypot is provided for R 2.13.x
      only.

    â€¢ Direct access to the "source" attribute of functions is
      deprecated; use deparse(fn, control="useSource") to access it,
      and removeSource(fn) to remove it.

    â€¢ R CMD build --binary is now formally deprecated: R CMD INSTALL
      --build has long been the preferred alternative.

    â€¢ Single-character package names are deprecated (and R is already
      disallowed to avoid confusion in Depends: fields).

  BUG FIXES:

    â€¢ drop.terms and the [ method for class "terms" no longer add back
      an intercept.  (Reported by Niels Hansen.)

    â€¢ aggregate preserves the class of a column (e.g. a date) under
      some circumstances where it discarded the class previously.

    â€¢ p.adjust() now always returns a vector result, as documented.  In
      previous versions it copied attributes (such as dimensions) from
      the p argument: now it only copies names.

    â€¢ On PDF and PostScript devices, a line width of zero was recorded
      verbatim and this caused problems for some viewers (a very thin
      line combined with a non-solid line dash pattern could also cause
      a problem).  On these devices, the line width is now limited at
      0.01 and for very thin lines with complex dash patterns the
      device may force the line dash pattern to be solid.  (Reported by
      Jari Oksanen.)

    â€¢ The str() method for class "POSIXt" now gives sensible output for
      0-length input.

    â€¢ The one- and two-argument complex maths functions failed to warn
      if NAs were generated (as their numeric analogues do).

    â€¢ Added .requireCachedGenerics to the dont.mind list for library()
      to avoid warnings about duplicates.

    â€¢ $<-.data.frame messed with the class attribute, breaking any S4
      subclass.  The S4 data.frame class now has its own $<- method,
      and turns dispatch on for this primitive.

    â€¢ Map() did not look up a character argument f in the correct
      frame, thanks to lazy evaluation.  (PR#14495)

    â€¢ file.copy() did not tilde-expand from and to when to was a
      directory.  (PR#14507)

    â€¢ It was possible (but very rare) for the loading test in R CMD
      INSTALL to crash a child R process and so leave around a lock
      directory and a partially installed package.  That test is now
      done in a separate process.

    â€¢ plot(<formula>, data=<matrix>,..) now works in more cases;
      similarly for points(), lines() and text().

    â€¢ edit.default() contained a manual dispatch for matrices (the
      "matrix" class didn't really exist when it was written).  This
      caused an infinite recursion in the no-GUI case and has now been
      removed.

    â€¢ data.frame(check.rows = TRUE) sometimes worked when it should
      have detected an error.  (PR#14530)

    â€¢ scan(sep= , strip.white=TRUE) sometimes stripped trailing spaces
      from within quoted strings.  (The real bug in PR#14522.)

    â€¢ The rank-correlation methods for cor() and cov() with use =
      "complete.obs" computed the ranks before removing missing values,
      whereas the documentation implied incomplete cases were removed
      first.  (PR#14488)

      They also failed for 1-row matrices.

    â€¢ The perpendicular adjustment used in placing text and expressions
      in the margins of plots was not scaled by par("mex"). (Part of
      PR#14532.)

    â€¢ Quartz Cocoa device now catches any Cocoa exceptions that occur
      during the creation of the device window to prevent crashes.  It
      also imposes a limit of 144 ft^2 on the area used by a window to
      catch user errors (unit misinterpretation) early.

    â€¢ The browser (invoked by debug(), browser() or otherwise) would
      display attributes such as "wholeSrcref" that were intended for
      internal use only.

    â€¢ R's internal filename completion now properly handles filenames
      with spaces in them even when the readline library is used.  This
      resolves PR#14452 provided the internal filename completion is
      used (e.g., by setting rc.settings(files = TRUE)).

    â€¢ Inside uniroot(f, ...), -Inf function values are now replaced by
      a maximally *negative* value.

    â€¢ rowsum() could silently over/underflow on integer inputs
      (reported by Bill Dunlap).

    â€¢ as.matrix() did not handle "dist" objects with zero rows.

CHANGES IN R VERSION 2.12.2 patched:

  NEW FEATURES:

    â€¢ max() and min() work harder to ensure that NA has precedence over
      NaN, so e.g. min(NaN, NA) is NA.  (This was not previously
      documented except for within a single numeric vector, where
      compiler optimizations often defeated the code.)

  BUG FIXES:

    â€¢ A change to the C function R_tryEval had broken error messages in
      S4 method selection; the error message is now printed.

    â€¢ PDF output with a non-RGB color model used RGB for the line
      stroke color.  (PR#14511)

    â€¢ stats4::BIC() assumed without checking that an object of class
      "logLik" has an "nobs" attribute: glm() fits did not and so BIC()
      failed for them.

    â€¢ In some circumstances a one-sided mantelhaen.test() reported the
      p-value for the wrong tail.  (PR#14514)

    â€¢ Passing the invalid value lty = NULL to axis() sent an invalid
      value to the graphics device, and might cause the device to
      segfault.

    â€¢ Sweave() with concordance=TRUE could lead to invalid PDF files;
      Sweave.sty has been updated to avoid this.

    â€¢ Non-ASCII characters in the titles of help pages were not
      rendered properly in some locales, and could cause errors or
      warnings.    â€¢ checkRd() gave a spurious error if the \href macro was used.

Interview Luis Torgo Author Data Mining with R

Example of k-nearest neighbour classification — Image via Wikipedia

Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.

Ajay- Describe your career in science. How do you think can more young people be made interested in science.

Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).

I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.

That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.

Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book 😉

Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book

Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.

The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.

In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.

Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.

Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.

In other words: do not expect to become rich with the solution I describe in the chapter !

Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R

Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.

I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!

Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc

Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!

Ajay- What do you do to relax or unwind when not working?

Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.

Luis Torgo

Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.

For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-

For more information and to place an order, visit us at http://www.crcpress.com. Order online and apply 20% Off discount code 907HM at checkout. CRC is pleased to offer free standard shipping on all online orders!

link to the book page http://www.crcpress.com/product/isbn/9781439810187

Price: $79.95
Cat. #: K10510
ISBN: 9781439810187
ISBN 10: 1439810184
Publication Date: November 09, 2010
Number of Pages: 305
Availability: In Stock
Binding(s): Hardback

Finally! A practical R book on Data Mining: “Data Mining With R, Learning with Case Studies,” by Luis Torgo (r-bloggers.com)
INFORMS Data Mining Competition leaders used Open Source software (r-bloggers.com)
Is Data-Mining Free Speech? The Supreme Court Agrees to Decide a Crucial Case (dailyfinance.com)
Mining of Massive Data Sets (kinlane.com)
Case Study (jonathanlewis.wordpress.com)
Statistical Aspects of Data Mining (kinlane.com)
5 of the Best Free and Open Source Data Mining Software (junauza.com)
US top court to decide state drug data mining law (reuters.com)
Data-mining Google Books: Does the Reader Have To Be Human? (scholarlykitchen.sspnet.org)
Data Mining Competitions | TunedIT (tunedit.org)

Interview Ajay Ohri Decisionstats.com with DMR

From-

http://www.dataminingblog.com/data-mining-research-interview-ajay-ohri/

Here is the winner of the Data Mining Research People Award 2010: Ajay Ohri! Thanks to Ajay for giving some time to answer Data Mining Research questions. And all the best to his blog, Decision Stat!

Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?

Ajay Ohri (AO): I am a business consultant and writer based out of Delhi- India. I have been working in and around the field of business analytics since 2004, and have worked with some very good and big companies primarily in financial analytics and outsourced analytics. Since 2007, I have been writing my blog at http://decisionstats.com which now has almost 10,000 views monthly.

All in all, I wrote about data, and my hobby is also writing (poetry). Both my hobby and my profession stem from my education ( a masters in business, and a bachelors in mechanical engineering).

My research interests in data mining are interfaces (simpler interfaces to enable better data mining), education (making data mining less complex and accessible to more people and students), and time series and regression (specifically ARIMAX)
In business my research interests software marketing strategies (open source, Software as a service, advertising supported versus traditional licensing) and creation of technology and entrepreneurial hubs (like Palo Alto and Research Triangle, or Bangalore India).

DMR: I know you have worked with both SAS and R. Could you give your opinion about these two data mining tools?

AO: As per my understanding, SAS stands for SAS language, SAS Institute and SAS software platform. The terms are interchangeably used by people in industry and academia- but there have been some branding issues on this.
I have not worked much with SAS Enterprise Miner , probably because I could not afford it as business consultant, and organizations I worked with did not have a budget for Enterprise Miner.
I have worked alone and in teams with Base SAS, SAS Stat, SAS Access, and SAS ETS- and JMP. Also I worked with SAS BI but as a user to extract information.
You could say my use of SAS platform was mostly in predictive analytics and reporting, but I have a couple of projects under my belt for knowledge discovery and data mining, and pattern analysis. Again some of my SAS experience is a bit dated for almost 1 year ago.

I really like specific parts of SAS platform – as in the interface design of JMP (which is better than Enterprise Guide or Base SAS ) -and Proc Sort in Base SAS- I guess sequential processing of data makes SAS way faster- though with computing evolving from Desktops/Servers to even cheaper time shared cloud computers- I am not sure how long Base SAS and SAS Stat can hold this unique selling proposition.

I dislike the clutter in SAS Stat output, it confuses me with too much information, and I dislike shoddy graphics in the rendering output of graphical engine of SAS. Its shoddy coding work in SAS/Graph and if JMP can give better graphics why is legacy source code preventing SAS platform from doing a better job of it.

I sometimes think the best part of SAS is actually code written by Goodnight and Sall in 1970’s , the latest procs don’t impress me much.

SAS as a company is something I admire especially for its way of treating employees globally- but it is strange to see the rest of tech industry not following it. Also I don’t like over aggression and the SAS versus Rest of the Analytics /Data Mining World mentality that I sometimes pick up when I deal with industry thought leaders.

I think making SAS Enterprise Miner, JMP, and Base SAS in a completely new web interface priced at per hour rates is my wishlist but I guess I am a bit sentimental here- most data miners I know from early 2000’s did start with SAS as their first bread earning software. Also I think SAS needs to be better priced in Business Intelligence- it seems quite cheap in BI compared to Cognos/IBM but expensive in analytical licensing.

If you are a new stats or business student, chances are – you may know much more R than SAS today. The shift in education at least has been very rapid, and I guess R is also more of a platform than a analytics or data mining software.

I like a lot of things in R- from graphics, to better data mining packages, modular design of software, but above all I like the can do kick ass spirit of R community. Lots of young people collaborating with lots of young to old professors, and the energy is infectious. Everybody is a CEO in R ’s world. Latest data mining algols will probably start in R, published in journals.

Which is better for data mining SAS or R? It depends on your data and your deadline. The golden rule of management and business is -it depends.

Also I have worked with a lot of KXEN, SQL, SPSS.

DMR: Can you tell us more about Decision Stats? You have a traffic of 120′000 for 2010. How did you reach such a success?

AO: I don’t think 120,000 is a success. Its not a failure. It just happened- the more I wrote, the more people read.In 2007-2008 I used to obsess over traffic. I tried SEO, comments, back linking, and I did some black hat experimental stuff. Some of it worked- some didn’t.

In the end, I started asking questions and interviewing people. To my surprise, senior management is almost always more candid , frank and honest about their views while middle managers, public relations, marketing folks can be defensive.

Social Media helped a bit- Twitter, Linkedin, Facebook really helped my network of friends who I suppose acted as informal ambassadors to spread the word.
Again I was constrained by necessity than choices- my middle class finances ( I also had a baby son in 2007-my current laptop still has some broken keys – by my inability to afford traveling to conferences, and my location Delhi isn’t really a tech hub.

The more questions I asked around the internet, the more people responded, and I wrote it all down.

I guess I just was lucky to meet a lot of nice people on the internet who took time to mentor and educate me.

I tried building other websites but didn’t succeed so i guess I really don’t know. I am not a smart coder, not very clever at writing but I do try to be honest.

Basic economics says pricing is proportional to demand and inversely proportional to supply. Honest and candid opinions have infinite demand and an uncertain supply.

DMR: There is a rumor about a R book you plan to publish in 2011 Can you confirm the rumor and tell us more?

AO: I just signed a contract with Springer for ” R for Business Analytics”. R is a great software, and lots of books for statistically trained people, but I felt like writing a book for the MBAs and existing analytics users- on how to easily transition to R for Analytics.

Like any language there are tricks and tweaks in R, and with a focus on code editors, IDE, GUI, web interfaces, R’s famous learning curve can be bent a bit.

Making analytics beautiful, and simpler to use is always a passion for me. With 3000 packages, R can be used for a lot more things and a lot more simply than is commonly understood.
The target audience however is business analysts- or people working in corporate environments.

Brief Bio-
Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industries in India. He has worked with the top two Indian outsourcers listed on NYSE,and with Citigroup on cross sell analytics where he helped sell an extra 50000 credit cards by cross sell analytics .He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics .He regularly writes on analytics topics on his web site www.decisionstats.com and is currently working on open source analytical tools like R besides analytical software like SPSS and SAS.

Skills of a good data miner (zyxo.wordpress.com)
Data Mining with WEKA (r-bloggers.com)
How Data Mining Can Help You Score on the First Date (volokh.com)
Upcoming webinar on investigative analytics (dbms2.com)
IBM SPSS 19 Now Available to the Global Academic Community via e-academy’s OnTheHub eStore (prweb.com)

2011 Forecast-ying

I had recently asked some friends from my Twitter lists for their take on 2011, atleast 3 of them responded back with the answer, 1 said they were still on it, and 1 claimed a recent office event.

Anyways- I take note of the view of forecasting from

http://www.uiah.fi/projekti/metodi/190.htm

The most primitive method of forecasting is guessing. The result may be rated acceptable if the person making the guess is an expert in the matter.

Ajay- people will forecast in end 2010 and 2011. many of them will get forecasts wrong, some very wrong, but by Dec 2011 most of them would be writing forecasts on 2012. almost no one will get called on by irate users-readers- (hey you got 4 out of 7 wrong last years forecast!) just wont happen. people thrive on hope. so does marketing. in 2011- and before

and some forecasts from Tom Davenport’s The International Institute for Analytics (IIA) at

http://iianalytics.com/2010/12/2011-predictions-for-the-analytics-industry/

Regulatory and privacy constraints will continue to hamper growth of marketing analytics.

(I wonder how privacy and analytics can co exist in peace forever- one view is that model building can use anonymized data suppose your IP address was anonymized using a standard secret Coco-Cola formula- then whatever model does get built would not be of concern to you individually as your privacy is protected by the anonymization formula)

Anyway- back to the question I asked-

What are the top 5 events in your industry (events as in things that occured not conferences) and what are the top 3 trends in 2011.

I define my industry as being online technology writing- research (with a heavy skew on stat computing)

My top 5 events for 2010 were-

1) Consolidation- Big 5 software providers in BI and Analytics bought more, sued more, and consolidated more. The valuations rose. and rose. leading to even more smaller players entering. Thus consolidation proved an oxy moron as total number of influential AND disruptive players grew.

2) Cloudy Computing- Computing shifted from the desktop but to the mobile and more to the tablet than to the cloud. Ipad front end with Amazon Ec2 backend- yup it happened.

3) Open Source grew louder- yes it got more clients. and more revenue. did it get more market share. depends on if you define market share by revenues or by users.

Both Open Source and Closed Source had a good year- the pie grew faster and bigger so no one minded as long their slices grew bigger.

4) We didnt see that coming –

Technology continued to surprise with events (thats what we love! the surprises)

Revolution Analytics broke through R’s Big Data Barrier, Tableau Software created a big Buzz, Wikileaks and Chinese FireWalls gave technology an entire new dimension (though not universally popular one).

people fought wars on emails and servers and social media- unfortunately the ones fighting real wars in 2009 continued to fight them in 2010 too

5) Money-

SAP,SAS,IBM,Oracle,Google,Microsoft made more money than ever before. Only Facebook got a movie named on itself. Venture Capitalists pumped in money in promising startups- really as if in a hurry to park money before tax cuts expired in some countries.

2011 Top Three Forecasts

1) Surprises- Expect to get surprised atleast 10 % of the time in business events. As internet grows the communication cycle shortens, the hype cycle amplifies buzz-

more unstructured data is created (esp for marketing analytics) leading to enhanced volatility

2) Growth- Yes we predict technology will grow faster than the automobile industry. Game changers may happen in the form of Chrome OS- really its Linux guys-and customer adaptability to new USER INTERFACES. Design will matter much more in technology on your phone, on your desktop and on your internet. Packaging sells.

False Top Trend 3) I will write a book on business analytics in 2011. yes it is true and I am working with A publisher. No it is not really going to be a top 3 event for anyone except me,publisher and lucky guys who read it.

3) Creating technology and technically enabling creativity will converge at an accelerated rate. use of widgets, guis, snippets, ide will ensure creative left brains can code easier. and right brains can design faster and better due to a global supply chain of techie and artsy professionals.

Google Chrome OS Hardware Vanishes In The Cloud (informationweek.com)
Google’s Chrome OS notebook gets unboxed, prodded, and praised (linuxfordevices.com)
Why we removed the WikiLeaks visualizations | Tableau Software (tableausoftware.com)
Tableau Software Adds In-Memory Database Engine (customerthink.com)
Sales Forecasting Methods & Models (thinkup.waldenu.edu)
Revolution Analytics Introduces Enterprise-Class Application Integration, Deployment & Administration for R (eon.businesswire.com)
HTC estimates it will ship 60 million handsets in 2011 [TNW Mobile] (thenextweb.com)
Lexalytics Predicts State of Text and Sentiment Analysis Market 2011 (prweb.com)
Social Business Forecast: 2011 The Year of Integration (e1evation.com)
Forecast: iPad share of Net traffic to double in a year (news.cnet.com)

How to Analyze Wikileaks Data – R SPARQL

Image via Wikipedia

Drew Conway- one of the very very few Project R voices I used to respect until recently. declared on his blog http://www.drewconway.com/zia/

Why I Will Not Analyze The New WikiLeaks Data

and followed it up with how HE analyzed the post announcing the non-analysis.

“If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and 35 Comments and 255 Reactions was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain”

Anyways – non American users of R Project can analyze the Wikileaks data using the R SPARQL package I would advise American friends not to use this approach or attempt to analyze any data because technically the data is still classified and it’s possession is illegal (which is the reason Federal employees and organizations receiving federal funds have advised not to use this or any WikiLeaks dataset)

https://code.google.com/p/r-sparql/

Overview

R is a programming language designed for statistics.

R Sparql allows you to run SPARQL Queries inside R and store it as a R data frame.

The main objective is to allow the integration of Ontologies with Statistics.

It requires Java and rJava installed.

Example (in R console):

> library(sparql)> data <- query("SPARQL query>","RDF file or remote SPARQL Endpoint")

and the data in a remote SPARQL http://www.ckan.net/package/cablegate

SPARQL is an easy language to pick up, but dammit I am not supposed to blog on my vacations.

http://code.google.com/p/r-sparql/wiki/GettingStarted

Getting Started¶

1. Installation

1.1 Make sure Java is installed and is the default JVM:

$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk$ sudo update-java-alternatives -s java-6-sun

1.2 Configure R to use the correct version of Java

$ sudo R CMD javareconf

1.3 Install the rJava library

$ R> install.packages("rJava")> q()

1.4 Download and install the sparql library

Download: http://code.google.com/p/r-sparql/downloads/list

$ R CMD INSTALL sparql-0.1-X.tar.gz

2. Executing a SPARQL query

2.1 Start R

#Load the librarylibrary(sparql)#Run the queryresult <- query("SELECT ... ", "http://...")#Print the resultprint(result)

3. Examples

3.1 The Query can be a string or a local file:

query("SELECT ?date ?number ?season WHERE {  ... }", "local-file.rdf")

query("my-query.rq", "local-file.rdf")

The package will detect if my-query.rq exists and will load it from the file.

3.3 The uri can be a file or an url (for remote queries):

query("SELECT ... ","local-file.db")

query("SELECT ... ","http://dbpedia.org/sparql")

3.4 Get some examples here: http://code.google.com/p/r-sparql/downloads/list

SPARQL Tutorial-

http://openjena.org/ARQ/Tutorial/index.html

Please share:

Please share:

Windows-specific changes to R

CHANGES IN R VERSION 2.13.0

WINDOWS VERSION

NEW FEATURES

DEPRECATED

INSTALLATION

BUG FIXES

Please share:

Related Articles

Please share:

Related Articles

Please share:

Related Articles

Please share:

Overview

Getting Started¶

1. Installation

2. Executing a SPARQL query

3. Examples

Related Articles

Please share: