Data Mining Survivor: CSV_Data0

DATA MINING
Desktop Survival Guide
by Graham Williams

CSV Options

The Rattle interface provides a couple of options for tuning how we read the data from a CSV file, as we can might have noticed in Figure 5.2. We can choose the field delimiter through the Separator entry. A comma is the default. To load a exttt.txt file which uses a tab as the field separator we replace the comma with the special code \\t (that is, two slashes followed by a t) to represent a tab. You can also leave the separator empty and any white space will be used as the separator.

From the read.csv viewpoint the effect is to include the appropriate argument in the call to the function:

> ds <- read.csv("mydata.txt", sep="\t")

The other option of interest when loading a dataset is the Header check box. Generally, a CSV file will have as its first row, a list of column names. These names will be used by R and Rattle as the names of the variables. However, not all CSV files include headers, and if that is the case then un-check the Header check box. On loading a CSV file that does not contain headers R will generate variable names for the columns. The check box translates to the Rarg[]header argument in the call to read.csv:

> ds <- read.csv("mydata.csv", header=FALSE)

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010