Handling time and date in R

John Harrison's famous chronometer
Image via Wikipedia

One of the most frustrating things I had to do while working as financial business analysts was working with Data Time Formats in Base SAS. The syntax was simple enough and SAS was quite good with handing queries to the Oracle data base that the client was using, but remembering the different types of formats in SAS language was a challenge (there was a date9. and date6 and mmddyy etc )

Data and Time variables are particularly important variables in financial industry as almost everything is derived variable from the time (which varies) while other inputs are mostly constants. This includes interest as well as late fees and finance fees.

In R, date and time are handled quite simply-

Use the strptime( dataset, format) function to convert the character into string

For example if the variable dob is “01/04/1977) then following will convert into a date object

z=strptime(dob,”%d/%m/%Y”)

and if the same date is 01Apr1977

z=strptime(dob,"%d%b%Y")

 

does the same

For troubleshooting help with date and time, remember to enclose the formats

%d,%b,%m and % Y in the same exact order as the original string- and if there are any delimiters like ” -” or “/” then these delimiters are entered in exactly the same order in the format statement of the strptime

Sys.time() gives you the current date-time while the function difftime(time1,time2) gives you the time intervals( say if you have two columns as date-time variables)

 

What are the various formats for inputs in date time?

%a
Abbreviated weekday name in the current locale. (Also matches full name on input.)
%A
Full weekday name in the current locale. (Also matches abbreviated name on input.)
%b
Abbreviated month name in the current locale. (Also matches full name on input.)
%B
Full month name in the current locale. (Also matches abbreviated name on input.)
%c
Date and time. Locale-specific on output, "%a %b %e %H:%M:%S %Y" on input.
%d
Day of the month as decimal number (01–31).
%H
Hours as decimal number (00–23).
%I
Hours as decimal number (01–12).
%j
Day of year as decimal number (001–366).
%m
Month as decimal number (01–12).
%M
Minute as decimal number (00–59).
%p
AM/PM indicator in the locale. Used in conjunction with %I and not with %H. An empty string in some locales.
%S
Second as decimal number (00–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
%w
Weekday as decimal number (0–6, Sunday is 0).
%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
%x
Date. Locale-specific on output, "%y/%m/%d" on input.
%X
Time. Locale-specific on output, "%H:%M:%S" on input.
%y
Year without century (00–99). Values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2004 POSIX standard, but it does also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.
%Y
Year with century.
%z
Signed offset in hours and minutes from UTC, so -0800 is 8 hours behind UTC.
%Z
(output only.) Time zone as a character string (empty if not available).

Also to read the helpful documentation (especially for time zone level, and leap year seconds and differences)
http://stat.ethz.ch/R-manual/R-patched/library/base/html/difftime.html
http://stat.ethz.ch/R-manual/R-patched/library/base/html/strptime.html
http://stat.ethz.ch/R-manual/R-patched/library/base/html/Ops.Date.html
http://stat.ethz.ch/R-manual/R-patched/library/base/html/Dates.html

 

Troubleshooting Rattle Installation- Data Mining R GUI

Screenshot of Synaptic Package Manager running...
Image via Wikipedia

I really find the Rattle GUI very very nice and easy to do any data mining task. The software is available from http://rattle.togaware.com/

The only issue is Rattle can be quite difficult to install due to dependencies on GTK+

After fiddling for a couple of years- this is what I did

1) Created dual boot OS- Basically downloaded the netbook remix from http://ubuntu.com I created a dual boot OS so you can choose at the beginning whether to use Windows or Ubuntu Linux in that session.  Alternatively you can download VM Player www.vmware.com/products/player/ if you want to do both

2) Download R packages using Ubuntu packages and Install GTK+ dependencies before that.

GTK + Requires

  1. Libglade
  2. Glib
  3. Cairo
  4. Pango
  5. ATK

If  you are a Linux newbie like me who doesnt get the sudo apt get, tar, cd, make , install rigmarole – scoot over to synaptic software packages or just the main ubuntu software centre and download these packages one by one.

For R Dependencies, you need

  • PMML
  • XML
  • RGTK2

Again use r-cran as the prefix to these package names and simply install (almost the same way Windows does it easily -double click)

see http://packages.ubuntu.com/search?suite=lucid&searchon=names&keywords=r-cran

4) Install Rattle from source

http://rattle.togaware.com/rattle-download.html

Advanced users can download the Rattle source packages directly:

Save theses to your hard disk (e.g., to your Desktop) but don’t extract them. Then, on GNU/Linux run the install command shown below. This command is entered into a terminal window:

  • R CMD INSTALL rattle_2.6.0.tar.gz

After installation-

5) Type library(rattle) and rattle.info to get messages on what R packages to update for a proper functioning

</code>

> library(rattle)
Rattle: Graphical interface for data mining using R.
Version 2.6.0 Copyright (c) 2006-2010 Togaware Pty Ltd.
Type 'rattle()' to shake, rattle, and roll your data.
> rattle.info()
Rattle: version 2.6.0
R: version 2.11.1 (2010-05-31) (Revision 52157)

Sysname: Linux
Release: 2.6.35-23-generic
Version: #41-Ubuntu SMP Wed Nov 24 10:18:49 UTC 2010
Nodename: k1-M725R
Machine: i686
Login: k1ng
User: k1ng

Installed Dependencies
RGtk2: version 2.20.3
pmml: version 1.2.26
colorspace: version 1.0-1
cairoDevice: version 2.14
doBy: version 4.1.2
e1071: version 1.5-24
ellipse: version 0.3-5
foreign: version 0.8-41
gdata: version 2.8.1
gtools: version 2.6.2
gplots: version 2.8.0
gWidgetsRGtk2: version 0.0-69
Hmisc: version 3.8-3
kernlab: version 0.9-12
latticist: version 0.9-43
Matrix: version 0.999375-46
mice: version 2.4
network: version 1.5-1
nnet: version 7.3-1
party: version 0.9-99991
playwith: version 0.9-53
randomForest: version 4.5-36 upgrade available 4.6-2
rggobi: version 2.1.16
survival: version 2.36-2
XML: version 3.2-0
bitops: version 1.0-4.1

Upgrade the packages with:

 > install.packages(c("randomForest"))

<code>

Now upgrade whatever package rattle.info tells to upgrade.

This is much simpler and less frustrating than some of the other ways to install Rattle.

If all goes well, you will see this familiar screen popup when you type

>rattle()