Data Mining Survivor: Data_Cleaning - Selectively Changing Vector Values

DATA MINING
Desktop Survival Guide
by Graham Williams

Selectively Changing Vector Values

The next example changes the values in one vector (weights) according to some conditions on the values in another vector (data). The data vector is randomly sampled from the Rvariableletters of the alphabet. Both vectors are the same length. Where data is larger than m, the weight is set to 2. Where it is between d and m, the weight is set to 3.

> weights <- rep(1, 10) > data <- letters[sample(seq(1,length(letters)), 10)] > data [1] "y" "b" "j" "m" "c" "q" "o" "a" "i" "p" > weights[data > "m"] <- 2 > weights [1] 2 1 1 1 1 2 2 1 1 2 > weights[data <= "m" & data >= "d"] <- 3 > weights [1] 2 1 3 3 1 2 2 1 3 2

An example of where this might be useful is in data mining pre-processing where we wish to selectively change the weights associated with entities in a modelling exercise. The weights might indicate the relative important the specific entities. An example of this transformation is included in the usage of rpart in See Chapter .

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010