DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Decision Tree |
The rpart function can also be used with a Surv object as the target. Here we build a decision tree.
> library(rpart) > l.rpart <- rpart(l.Surv ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss, data=lung) > l.rpart |
n= 228 node), split, n, deviance, yval * denotes terminal node 1) root 228 307.928100 1.0000000 2) ph.ecog< 1.5 176 223.146500 0.8607050 4) sex>=1.5 69 79.663710 0.6067812 8) pat.karno>=85 41 38.552160 0.4954413 * 9) pat.karno< 85 28 38.961790 0.7912681 18) age< 52.5 7 8.297501 0.3610969 * 19) age>=52.5 21 26.638230 1.0017890 * 5) sex< 1.5 107 134.504400 1.0669060 10) pat.karno>=65 98 123.426900 1.0085900 20) age< 64.5 53 63.983660 0.8433300 40) ph.ecog< 0.5 20 22.029470 0.6019784 * 41) ph.ecog>=0.5 33 38.488990 1.0838490 * 21) age>=64.5 45 56.194870 1.2707780 42) wt.loss< 7.5 21 22.817280 0.9491270 * 43) wt.loss>=7.5 24 29.796110 1.7206300 * 11) pat.karno< 65 9 7.162132 2.0298660 * 3) ph.ecog>=1.5 52 70.309440 1.7041700 6) wt.loss>=21 15 18.094340 0.8363971 * 7) wt.loss< 21 37 38.904150 2.5294840 14) sex>=1.5 16 8.883448 1.8424400 * 15) sex< 1.5 21 27.485660 3.1026240 30) age< 59 7 4.587466 1.7378480 * 31) age>=59 14 17.993450 4.3012260 * |
The summary command will provide fuller details of the resulting model.
> summary(l.rpart) |
The usual predict can be used to obtain the predictions of the estimated rate. Todo: What is actually being predicted here? These are compared to the same predictions from the coxph model.
> l.rpart.pred <- predict(l.rpart, lung) > results <- cbind(l.coxph.age.results, rpart=l.rpart.pred) > head(results) |
time status lp risk expected sex 1 306 2 0.2096146 1.233203 0.8280030 0.2096146 2 455 2 0.2096146 1.233203 1.3880051 0.2096146 3 1010 1 0.2096146 1.233203 3.5060567 0.2096146 4 210 2 0.2096146 1.233203 0.5185439 0.2096146 5 883 2 0.2096146 1.233203 3.5060567 0.2096146 6 1022 1 0.2096146 1.233203 3.5060567 0.2096146 rpart 1 0.9491270 2 1.7206297 3 0.6019784 4 2.0298656 5 0.6019784 6 0.9491270 |
> plot(l.rpart) > text(l.rpart, use.n=TRUE) |