Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Factors



> ds <- data.frame(age=c(34, 56, 23, 72, 48), 
                   risk=c("high", "low", "high", "low", "high"))
> ds
  age risk
1  34 high
2  56  low
3  23 high
4  72  low
5  48 high
> levels(ds$risk)
[1] "high" "low"

By default levels within a factor are not ordered:

> ds$age[1] < ds$age[2]
[1] TRUE
> ds$risk[1] < ds$risk[2]
[1] NA
Warning message:
< not meaningful for factors in: Ops.factor(ds$risk[1], ds$risk[2])

We can order the levels using the ordered function:

> ds$risk <- ordered(ds$risk)
> levels(ds$risk)
[1] "high" "low" 
> ds$risk[1] < ds$risk[2]
[1] TRUE

Saying that high is less than low is probably not what we wanted. The ordering used is the same as what levels returns.

You can change the names of the levels by assigning to the levels call:

> levels(ds$risk) <- c("upper", "lower")
> ds
  age  risk
1  34 upper
2  56 lower
3  23 upper
4  72 lower
5  48 upper



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010