![]() |
DATA MINING
Desktop Survival Guide by Graham Williams |
![]() |
|||
|
A boxplot (, ) (also known as a box-and-whisker plot) provides a graphical overview of how data is distributed over the number line. R's boxplot function displays a graphical representation of the textual summary of data. The skewness of the distribution of the data becomes clear.
A boxplot shows the median
(the second
quartile
or the 50th percentile) as the
thicker line within the box (). The top and bottom extents
of the box (
and
respectively) identify the upper
quartile (the third quartile or the 75th percentile) and the lower
quartile (the first quartile and the 25th percentile). The extent of
the box is known as the interquartile
range
(
). The dashed lines extend to the maximum
and minimum data points that are no more than
times the
interquartile range from the median. Outliers (points further than
times the interquartile range from the median) are then
individually plotted (at 3.23, 3.22, and 1.36). Our plot here adds
faint horizontal lines to more easily read off the various values.
load("wine.Rdata") attach(wine) boxplot(Ash, xlab="Ash") abline(h=seq(1.4, 3.2, 0.1), col="lightgray", lty="dotted") |