DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
The date() function returns the current date as a string: Wed Oct 20 06:48:06 2004.
To calculate the differences between times use difftime.
When importing data from a CSV file, for example, dates are simply
read as factors. These can easily be converted to date objects using
as.Date:
> ds <- read.csv("authors.csv") > ds$Notified [1] 2005/06/05 2005/06/05 > as.Date(ds$Notified, format="%Y/%m/%d") [1] NA "2005-06-05" "2005-06-05" |
The default format is "%Y-%m-%d"
. See the help for
strftime for an explanation of the format. Any extra
text found in the string after the text has been consumed by the
format string will simply be ignored. But if the format is not found
at the beginning of the string then a NA is returned.
> ds <- c("2005-05-22 12:35:00", "2005-05-23 abc","abc 2005-05-24") > ds [1] "2005-05-22 12:35:00" "2005-05-23 abc" "abc 2005-05-24" > class(ds) [1] "character" > ds <- as.Date(ds) > ds [1] "2005-05-22" "2005-05-23" NA > class(ds) [1] "Date" |
To compare date values use as.Date:
> ds > as.Date("2005-05-22") [1] FALSE TRUE NA |
To view the methods associated with the Date class:
> methods(class = "Date") [1] as.character.Date as.data.frame.Date as.POSIXct.Date Axis.Date* [5] c.Date cut.Date -.Date [<-.Date [9] [.Date [[.Date +.Date diff.Date [13] format.Date hist.Date* is.numeric.Date julian.Date [17] Math.Date mean.Date months.Date Ops.Date [21] plot.Date* print.Date quarters.Date rep.Date [25] round.Date seq.Date summary.Date Summary.Date [29] trunc.Date weekdays.Date Non-visible functions are asterisked |
To aggregate by month, some alternatives:
> library(chron) > dts=seq.dates("1/1/01","12/31/03") > rnum=rnorm(1:length(dts)) > df=data.frame(date=dts,obs=rnum) > aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum) > library(zoo) > aggregate(zoo(rnum, dts), as.yearmon, sum) > aggregate(rnum, list(dts = as.yearmon(dts)), sum) |
Extract the year from a vector of dates:
> dates <- c("26 Jan 1974", "April 3, 2002", "23 June, 1999", "2007") > gsub(".*([1-9][0-9]{3}).*", "\\1", dates) [1] "1974" "2002" "1999" "2007" |
> as.POSIXlt('2005-7-1') [1] "2005-07-01" > unlist(as.POSIXlt('2005-7-1')) sec min hour mday mon year wday yday isdst 0 0 0 1 6 105 5 181 0 |
From Carlos Hernandez on r-help 15 Jan 2009 (modified). Calculate the number of times each day of the week appears in any month over a number of years.
> library(zoo) # as.yearmon > dd <- seq(as.Date("2000-01-01"), as.Date("2004-12-31"), "day") > dow <- as.numeric(format(dd, "%w")) > ym <- as.yearmon(dd) > tab <- do.call(rbind, tapply(dow, ym, table)) > #rownames(tab) <- format(as.yearmon(as.numeric(rownames(tab)))) > colnames(tab) <- c("s", "m", "t", "w", "t", "f", "s") > head(tab) |
s m t w t f s Jan 2000 5 5 4 4 4 4 5 Feb 2000 4 4 5 4 4 4 4 Mar 2000 4 4 4 5 5 5 4 Apr 2000 5 4 4 4 4 4 5 May 2000 4 5 5 5 4 4 4 Jun 2000 4 4 4 4 5 5 4 |
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.