Data Mining Survivor: Boosting

DATA MINING
Desktop Survival Guide
by Graham Williams

Summary

Usage: Classification tasks, regression and other modelling.

Input: Training data consisting of entities expressed as attribute-value pairs, with a class associated with each observation.

Output: An ensemble of models which are to be deployed together with their decisions being combined to give a joint decision.

Complexity: Depends on complexity of the weak learner employed, but generally the weak learner is quite simple (e.g., OneR or Decision Stumps) hence scalability is generally good.

Availability: Freely available in Weka (See Chapter 53) and in R (See Chapter 50). Commercial data mining toolkits implementing AdaBoost include TreeNet (See Chapter 61), Statistica (See Chapter 60), and Virtual Predict (See Chapter 62).

Usage:	Classification tasks, regression and other modelling.
Input:	Training data consisting of entities expressed as attribute-value pairs, with a class associated with each observation.
Output:	An ensemble of models which are to be deployed together with their decisions being combined to give a joint decision.
Complexity:	Depends on complexity of the weak learner employed, but generally the weak learner is quite simple (e.g., OneR or Decision Stumps) hence scalability is generally good.
Availability:	Freely available in Weka (See Chapter 53) and in R (See Chapter 50). Commercial data mining toolkits implementing AdaBoost include TreeNet (See Chapter 61), Statistica (See Chapter 60), and Virtual Predict (See Chapter 62).

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010