DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Manipulating Data As SQL |
The Structured Query Language (SQL) is a declarative language commonly used for databases. Many data analysts know SQL and can manipulate data easily using SQL. The sqldf provides a mechanism for data analysts familiar with SQL to simply manipulate R data using SQL.
> library(sqldf) # Simple count > sqldf("select count(*) from iris") count(*) 1 150 > sqldf("select * from iris order by Sepal_Length desc limit 3") Sepal_Length Sepal_Width Petal_Length Petal_Width Species 1 7.9 3.8 6.4 2.0 virginica 2 7.7 3.8 6.7 2.2 virginica 3 7.7 2.6 6.9 2.3 virginica # New data frame with Species2 a factor with two levels. > sqldf("select Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, Species as Species2 from iris where Species <> 'setosa'") |
See http://code.google.com/p/sqldf/ for further examples.