DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Basics |
Use printcp to view the performance of the model.
> printcp(wine.rpart) Classification tree: rpart(formula = Type ~ ., data = wine) Variables actually used in tree construction: [1] Dilution Flavanoids Hue Proline Root node error: 107/178 = 0.60112 n= 178 CP nsplit rel error xerror xstd 1 0.495327 0 1.00000 1.00000 0.061056 2 0.317757 1 0.50467 0.47664 0.056376 3 0.056075 2 0.18692 0.28037 0.046676 4 0.028037 3 0.13084 0.23364 0.043323 5 0.010000 4 0.10280 0.21495 0.041825 |
The predict function will apply the model to data. The
data must contain the same variable on which the model was built. If
not an error is generated. This is a common problem when wanting to
apply the model to a new dataset that does not contain all the same
variables, but does contain the variables you are interested in.
> vars <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline") > predict(wine.rpart, wine[,vars]) Error in eval(expr, envir, enclos) : Object "Alcohol" not found |
Fix this up with
> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline, data=wine) > predict(wine.rpart, wine[,vars]) 1 2 3 1 0.96610169 0.03389831 0.00000000 2 0.96610169 0.03389831 0.00000000 [...] 70 0.03076923 0.93846154 0.03076923 71 0.00000000 0.25000000 0.75000000 [...] 177 0.00000000 0.25000000 0.75000000 178 0.00000000 0.02564103 0.97435897 |
Display a confusion matrix.
> table(predict(wine.rpart, wine, type="class"), wine$Type) 1 2 3 1 57 2 0 2 2 66 4 3 0 3 44 |