DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
CSV Options |
The Rattle interface provides a couple of options for tuning how we
read the data from a CSV file, as we can might have noticed in
Figure 5.2. We can choose the
field delimiter through the Separator entry. A comma is the
default. To load a exttt.txt file
which uses a tab as the field separator we replace the comma with the
special code \\t
(that is, two slashes followed by a
t) to represent a tab. You can also leave the separator empty
and any white space will be used as the separator.
From the read.csv viewpoint the effect is to include the appropriate argument in the call to the function:
> ds <- read.csv("mydata.txt", sep="\t") |
The other option of interest when loading a dataset is the Header check box. Generally, a CSV file will have as its first row, a list of column names. These names will be used by R and Rattle as the names of the variables. However, not all CSV files include headers, and if that is the case then un-check the Header check box. On loading a CSV file that does not contain headers R will generate variable names for the columns. The check box translates to the Rarg[]header argument in the call to read.csv:
> ds <- read.csv("mydata.csv", header=FALSE) |
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.