Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Dates and Times

The date() function returns the current date as a string: Wed Oct 20 06:48:06 2004.

To calculate the differences between times use difftime.

When importing data from a CSV file, for example, dates are simply read as factors. These can easily be converted to date objects using as.Date:

> ds <- read.csv("authors.csv")
> ds$Notified
   [1]            2005/06/05 2005/06/05 
> as.Date(ds$Notified, format="%Y/%m/%d")
   [1] NA           "2005-06-05" "2005-06-05"

The default format is "%Y-%m-%d". See the help for strftime for an explanation of the format. Any extra text found in the string after the text has been consumed by the format string will simply be ignored. But if the format is not found at the beginning of the string then a NA is returned.

> ds <- c("2005-05-22 12:35:00", "2005-05-23 abc","abc 2005-05-24")
> ds
[1] "2005-05-22 12:35:00" "2005-05-23 abc"      "abc 2005-05-24"     
> class(ds)
[1] "character"
> ds <- as.Date(ds)
> ds
[1] "2005-05-22" "2005-05-23" NA
> class(ds)
[1] "Date"

To compare date values use as.Date:

> ds > as.Date("2005-05-22")
[1] FALSE  TRUE    NA

To view the methods associated with the Date class:

> methods(class = "Date")
 [1] as.character.Date  as.data.frame.Date as.POSIXct.Date    Axis.Date*        
 [5] c.Date             cut.Date           -.Date             [<-.Date          
 [9] [.Date             [[.Date            +.Date             diff.Date         
[13] format.Date        hist.Date*         is.numeric.Date    julian.Date       
[17] Math.Date          mean.Date          months.Date        Ops.Date          
[21] plot.Date*         print.Date         quarters.Date      rep.Date          
[25] round.Date         seq.Date           summary.Date       Summary.Date      
[29] trunc.Date         weekdays.Date     

   Non-visible functions are asterisked

To aggregate by month, some alternatives:

> library(chron)
> dts=seq.dates("1/1/01","12/31/03")
> rnum=rnorm(1:length(dts))
> df=data.frame(date=dts,obs=rnum)
> aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
> library(zoo)
> aggregate(zoo(rnum, dts), as.yearmon, sum)
> aggregate(rnum, list(dts = as.yearmon(dts)), sum)

Rpackage[chron]Rpackage[zoo]

Extract the year from a vector of dates:

> dates <- c("26 Jan 1974", "April 3, 2002", "23 June, 1999", "2007")
>  gsub(".*([1-9][0-9]{3}).*", "\\1", dates)
[1] "1974" "2002" "1999" "2007"



> as.POSIXlt('2005-7-1')
[1] "2005-07-01"
> unlist(as.POSIXlt('2005-7-1'))
  sec   min  hour  mday   mon  year  wday  yday isdst 
    0     0     0     1     6   105     5   181     0

From Carlos Hernandez on r-help 15 Jan 2009 (modified). Calculate the number of times each day of the week appears in any month over a number of years.



> library(zoo) # as.yearmon
> dd <- seq(as.Date("2000-01-01"), as.Date("2004-12-31"), "day")
> dow <- as.numeric(format(dd, "%w"))
> ym <- as.yearmon(dd)
> tab <- do.call(rbind, tapply(dow, ym, table))
> #rownames(tab) <- format(as.yearmon(as.numeric(rownames(tab))))
> colnames(tab) <- c("s", "m", "t", "w", "t", "f", "s")
> head(tab)



         s m t w t f s
Jan 2000 5 5 4 4 4 4 5
Feb 2000 4 4 5 4 4 4 4
Mar 2000 4 4 4 5 5 5 4
Apr 2000 5 4 4 4 4 4 5
May 2000 4 5 5 5 4 4 4
Jun 2000 4 4 4 4 5 5 4

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010