DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Removing Duplicates |
The function duplicated identifies elements of a data structure that are duplicated:
> x <- c(1, 1, 1, 2, 2, 2, 3, 3, 3) > duplicated(x) |
[1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE |
> x <- x[!duplicated(x)] > x |
[1] 1 2 3 |
This is a simple example, but works just as well to remove duplicated rows from a matrix or data frame.
For whatever reason, suppose we have loaded the audit dataset into Rattle and want to remove duplicated hours, keeping just the first one of each. This process is performed external to Rattle and we need to have Rattle reset its view of the data, through a click of the Execute button, with the resulting smaller dataset as shown in Figure 23.11.
> crs$dataset <- crs$dataset[!duplicated(crs$dataset$Hours),] |
|