DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
R |
> library(rpart) > weather.rpart <- rpart(RainTomorrow ~ RainToday, data=weather) |
You can find which terminal branch each observation in the training dataset
ends up in with the Roption[]where component of the object.
> wine.rpart$where 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3 3 3 3 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [...] 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 9 8 9 9 9 9 9 9 9 9 9 9 9 9 9 4 4 9 |
The predict function will apply the model to data. The
data must contain the same variable on which the model was built. If
not an error is generated. This is a common problem when wanting to
apply the model to a new dataset that does not contain all the same
variables, but does contain the variables you are interested in.
> vars <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline") > predict(wine.rpart, wine[,vars]) Error in eval(expr, envir, enclos) : Object "Alcohol" not found |
Fix this up with
> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline, data=wine) > predict(wine.rpart, wine[,vars]) 1 2 3 1 0.96610169 0.03389831 0.00000000 2 0.96610169 0.03389831 0.00000000 [...] 70 0.03076923 0.93846154 0.03076923 71 0.00000000 0.25000000 0.75000000 [...] 177 0.00000000 0.25000000 0.75000000 178 0.00000000 0.02564103 0.97435897 |