DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Excel |
The simplest way to transfer data from Excel, or any spreadsheet in fact, is to save the data in CSV (Comma Separated Value) format, usually into a file with extension .csv. This is supported in all spreadsheet applications and is effective in that if we are fluent with data manipulation in Excel, then we can get our data into shape using Excel, and then load it into Rattle for data mining.
The read.xls function in the gdata package can read specified sheets from an Excel spreadsheet.
Alternatively, on MS/Windows Excel spreadsheetscan be directly accessed and manipulated through ODBCusing odbcConnectExcel. Available sheets can be listed with sqlTables and individual sheets can be queried through the sqlQuery function or else imported with sqlFetch. To use a spreadsheet as a database though, the first row of the spreadsheet must be the column names! If not, we will find that we end up reading from the second row of our data.
In this example we open a connection to a spreadsheet and then give a
sample query:
> library(RODBC) > channel <- odbcConnectExcel("h:/audit.xls") > ds <- sqlQuery(channel, "SELECT * FROM `Sheet1$` WHERE Type = "TOC" AND Valve="5010-05"") > odbcClose(channel) |
To simply fetch the full contents of a single sheet of a spreedsheet
we can use the sqlFetch query:
library(RODBC) channel <- odbcConnectExcel("h:/audit.xls") ds <- sqlFetch(xlsConnect, "Sheet1") odbcClose(xlsConnect) |
On MS/Windows you can also use the xlsReadWrite
package
to directly access and manipulate an Excel spreadsheet. For example,
to read a spreadsheet we can use read.xls:
library(xlsReadWrite) ds <- read.xls("audit.xls", colNames=TRUE, sheet=6, colClasses=c("factor","integer","double")) |
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.