Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Data Mining

Statistics is one of the fundamental tools for the data miner. Statistics is essentially about uncertainty--to understand it and thereby to make allowance for it. It also provides a framework for understanding the discoveries made in data mining. Discoveries need to be statistically sound and statistically significant--uncertainty associated with modelling needs to be understood.

We might also note some of the controversy around machine learning and statistics. Leading computational statistician (and one of the Core R team) Brian D. Ripley provocatively suggests that machine learning is statistics minus any checking of models and assumptions.



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010