|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Manipulating Data As SQL |
The Structured Query Language (SQL) is a declarative language commonly used for databases. Many data analysts know SQL and can manipulate data easily using SQL. The sqldf provides a mechanism for data analysts familiar with SQL to simply manipulate R data using SQL.
> library(sqldf)
# Simple count
> sqldf("select count(*) from iris")
count(*)
1 150
> sqldf("select * from iris order by Sepal_Length desc limit 3")
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 7.9 3.8 6.4 2.0 virginica
2 7.7 3.8 6.7 2.2 virginica
3 7.7 2.6 6.9 2.3 virginica
# New data frame with Species2 a factor with two levels.
> sqldf("select Sepal_Length, Sepal_Width,
Petal_Length, Petal_Width,
Species as Species2
from iris where Species <> 'setosa'")
|
See http://code.google.com/p/sqldf/ for further examples.