DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
A data frame is essentially a list of named vectors, where, unlike a
matrix, the different vectors (or columns) need not all be of the same
data type. A data frame is analogous to a database table, in that
each column has a single data type, but different columns can have
different data types. This is distinct from a matrix in which all
elements must be of the same data type.
> age <- c(35, 23, 56, 18) > gender <- c("m", "m", "f", "f") > people <- data.frame(Age=age, Gender=gender) > people Age Gender 1 35 m 2 23 m 3 56 f 4 18 f |
The columns of the data frame have names, and the names can be assigned as in the above example. The names can also be changed at any time by assignment to the output of the function call to colnames:
> colnames(people) [1] "Age" "Gender" > colnames(people)[2] <- "Sex" > colnames(people) [1] "Age" "Sex" > people Age Sex 1 35 m 2 23 m 3 56 f 4 18 f |
If we have the datasets we wish to combine as a single list of
datasets, we can use the do.call function to apply
rbind to that list so that each element of the list becomes
one argument to the rbind function:
j <- list() # Generate a list of data frames for (i in letters[1:26]) { j[[i]] <- data.frame(rep(i,25),matrix(rnorm(250),nrow=25)) } j[[1]] allj <- do.call("rbind", j) # Combine list of data frames into one. |
You can reshape data in a data frame using unstack:
> ds <- data.frame(type=c('x', 'y', 'x', 'x', 'x', 'y', 'y', 'x', 'y', 'y'), value=c(10, 5, 2, 6, 4, 8, 3, 6, 6, 8)) > ds type value 1 x 10 2 y 5 3 x 2 4 x 6 5 x 4 6 y 8 7 y 3 8 x 6 9 y 6 10 y 8 > unstack(ds, value ~ type) x y 1 10 5 2 2 8 3 6 3 4 4 6 5 6 8 |
To even assign the values to variables of the same names as the types
you could use attach:
> attach(unstack(ds, value ~ type)) > x [1] 10 2 6 4 6 > y [1] 5 8 3 6 8 |
We can see that a data frame is just a list using a combination of the unclass and str functions:
> str(unclass(iris)) |
List of 5 $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "row.names")= int [1:150] 1 2 3 4 5 6 7 8 9 10 ... |