DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Locating and Loading Data |
Using the Spreadsheet option of Rattle's Data tab we can directly load data from a exttt.csv file. Click the Filename button (Figure 5.2) to display the file chooser dialogue (Figure 5.3).
We can browse to the exttt.csv file we wish to load, highlight it, and click the Open button.
|
|
We have told Rattle the location and the name of the file to load. We now need to actually load the data with a click on the Execute button (or press the F2 key). This loads the contents of the file from the hard disk into the computer's memory, for processing by Rattle.
We have mentioned above that rattle supplies a number of sample CSV files and in particular provides the weather.csv data file. The file itself will have been installed when rattle was installed. We can ask R to tell us of its actual location using the system.file function which we type into the R Console:
> system.file("csv", "weather.csv", package = "rattle") |
[1] "/usr/local/lib/R/site-library/rattle/csv/weather.csv" |
The location reported will depend on your particular installation and operating system. Here the location is as on my own installation, which is a standard GNU/Linux system.
We can review the contents of the file using the file.show function. This will pop up a window displaying the contents of the file.
> fn <- system.file("csv", "weather.csv", package = "rattle") > file.show(fn) |
The file contents can be directly viewed outside of R and Rattle, with any simple text editor. If you aren't familiar with CSV files, it is instructional to do so. We will see that the top of the file will appear as:
Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine... 2007-11-01,Canberra,8,24.3,0,3.4,6.3,NW,30,SW,NW,6,20,68... 2007-11-02,Canberra,14,26.9,3.6,4.4,9.7,ENE,39,E,W,4,17,80... 2007-11-03,Canberra,13.7,23.4,3.6,5.8,3.3,NW,85,N,NNE,6,6,82... 2007-11-04,Canberra,13.3,15.5,39.8,7.2,9.1,NW,54,WNW,W,30,24,62... 2007-11-05,Canberra,7.6,16.1,2.8,5.6,10.6,SSE,50,SSE,ESE,20,28,68... 2007-11-06,Canberra,6.2,16.9,0,5.8,8.2,SE,44,SE,E,20,24,70... |
A CSV file is actually a normal text file that begins with a header row, listing the names of the variables, each separated by a comma. The remainder of the file after the header is expected to consist of rows of data that record the observations, again with fields separated by commas recording the values of the variables for each observation.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.