Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Removing Outliers

Tests for outliers have primarily been superseded by the use of robust methods. Outlier tests are poor in that outliers tend to damage results long before they are detected. Robust methods attempt to compensate rather than reject outliers. RandomForrest modelling helps avoid the issue of outliers.

You can get a list of what the boxplot function thinks are outliers:

> load("wine.RData")
> bp <- boxplot(wine$Ash, plot=FALSE)
> bp$out
[1] 3.22 1.36 3.23



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Saturday, 16 January 2010