DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
Overfitting is more of a problem when training on smaller datasets.
A characteristic of the random forest algorithm is that it will often overfit the training data. For any model builder this, at first, may be a little disconcerting, with hte usual thought that therefore the model will not generalise to new data. However, for random forests, this overfitting is not usually a problem. Applying the model to a test dataset will usually indicate that it does generalise quite well, and that it does not suffer from the usual consequence of a model that has overfit the training dataset.