DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Min Split (Rarg[]minsplit) |
The Rarg[]minsplit specifies the minimum number of observations that must exist at a node in the tree before any further splitting will be attempted.
Using rpart directly we specify Roption[]minsplit within an option called Roption[]control which takes the results from a function called rpart.control. In this example we
> set.seed(42) > w.train <- sample(nrow(weather), 0.5*nrow(weather)) > w.rpart <- rpart(RainTomorrow ~ Sunshine, data=weather[w.train,]) > w.rpart |
n=181 (2 observations deleted due to missingness) node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 181 25 No (0.8618785 0.1381215) * |
> table(predict(w.rpart, weatherAUS[-w.train,], type="class"), weatherAUS[-w.train, "RainTomorrow"]) |
No Yes No 21956 6204 Yes 0 0 |
> w.rpart <- rpart(RainTomorrow ~ Sunshine, data=weather[w.train,], control=rpart.control(minsplit=10)) > w.rpart |
n=181 (2 observations deleted due to missingness) node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 181 25 No (0.86187845 0.13812155) 2) Sunshine>=6.45 133 10 No (0.92481203 0.07518797) * 3) Sunshine< 6.45 48 15 No (0.68750000 0.31250000) 6) Sunshine< 3.15 21 4 No (0.80952381 0.19047619) 12) Sunshine>=1.25 9 0 No (1.00000000 0.00000000) * 13) Sunshine< 1.25 12 4 No (0.66666667 0.33333333) 26) Sunshine< 0.65 7 1 No (0.85714286 0.14285714) * 27) Sunshine>=0.65 5 2 Yes (0.40000000 0.60000000) * 7) Sunshine>=3.15 27 11 No (0.59259259 0.40740741) 14) Sunshine< 5.95 19 7 No (0.63157895 0.36842105) 28) Sunshine>=5.5 5 1 No (0.80000000 0.20000000) * 29) Sunshine< 5.5 14 6 No (0.57142857 0.42857143) 58) Sunshine< 4.8 11 4 No (0.63636364 0.36363636) * 59) Sunshine>=4.8 3 1 Yes (0.33333333 0.66666667) * 15) Sunshine>=5.95 8 4 No (0.50000000 0.50000000) * |
> table(predict(w.rpart, weatherAUS[-w.train,], type="class"), weatherAUS[-w.train, "RainTomorrow"]) |
No Yes No 21310 5730 Yes 646 474 |