DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Tuning Parameters |
For the Two Class paradigm of Rattle, the random forest model build builds a classification model. Each tree in the resulting ensemble model is then used to predict the class of an observation, with the proportion of trees predicting the positive class then being the probability of the observation being in the positive class.
Rattle provides access to just three parameters (Figure 13.1) for tuning the models built by the random forest model builder: the number of trees, sample size, and number of variables. As is generally the case with Rattle, the defaults are a very good starting point! The defaults are to build 500 trees, to not do any sampling of the training dataset, and to choose from the square root of the number of variables available. In Figure 13.1 we see that the number of variables has automatically been set to 3 for the audit_auto.csv dataset, which has 9 input variables.