DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Basic Data Summary |
Once a dataset has been loaded into Rattle we can start to obtain an idea of the shape of the data from the simple summary that is displayed, as in Figure 5.1. For example, the first variable, Date is recognised as a unique identifier for each observation. It has 366 unique values, which is the same number of observations. Rattle introduces a heuristic to note such variables as identifiers.
The following five variables are all identified as Numeric, followed by the Categoric WindGustDir, and so on.
The comment column will also identify how many missing observations there are for particular variables. Dealing with missing values is covered in Chapter .