DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Overlays and Point in Polygon |
Overlays can be used to locate points within polygons. For example, suppose we have a collection of earthquake locations (geocoded to longitude and latitude) and want to locate the area within which the earthquake is located (e.g., states of Australia).
We first construct a data frame containing the spatial coordinates of the earthquakes (from http://www.ga.gov.au/bin/listQuakes), together with their magnitude and a textual description of the location.
Todo: Move this data into a dataset within Rattle.
> earthquakes <- data.frame(latitude = c(-38.3624, -38.42, -30.26, -30.205, -16.314, -31.4918, -30.306, -32.635, -30.233, -37.423, -31.633, -30.286, -31.213, -30.237, -33.39), longitude = c(145.8692, 145.91, 117.716, 117.783, 121.252, 138.5274, 117.646, 138.066, 117.73, 146.04, 138.741, 117.711, 138.559, 117.742, 116.64), magnitude = c(3.3, 3.1, 2.5, 2.6, 5.1, 3.3, 2.3, 3.5, 2.5, 2.2, 2.1, 2.4, 2, 2.3, 2.6), location = c("NW of Korumburra VIC", "Near Korumburra VIC", "NW of Beacon WA", "N of Beacon WA", "NW of Broome WA", "NE of Hawker SA", "NW of Beacon WA", "SE of Pt Augusta SA", "NW of Beacon WA", "SW of Jamieson VIC", "E of Wilpena Pound SA", "NW of Beacon WA", "South of Hawker SA", "NW of Beacon WA", "SW of Darkan WA")) |
We can then convert the xy data (the longitude and latitude) in the data frame into a spatial object of class SpatialPointsDataFrame from the sp package.
> library(sp) > head(earthquakes) |
latitude longitude magnitude location 1 -38.3624 145.8692 3.3 NW of Korumburra VIC 2 -38.4200 145.9100 3.1 Near Korumburra VIC 3 -30.2600 117.7160 2.5 NW of Beacon WA 4 -30.2050 117.7830 2.6 N of Beacon WA 5 -16.3140 121.2520 5.1 NW of Broome WA 6 -31.4918 138.5274 3.3 NE of Hawker SA |
> class(earthquakes) |
[1] "data.frame" |
> coordinates(earthquakes) <- ~longitude+latitude > head(earthquakes) |
coordinates magnitude 1 (145.869, -38.3624) 3.3 2 (145.91, -38.42) 3.1 3 (117.716, -30.26) 2.5 4 (117.783, -30.205) 2.6 5 (121.252, -16.314) 5.1 6 (138.527, -31.4918) 3.3 7 (117.646, -30.306) 2.3 8 (138.066, -32.635) 3.5 9 (117.73, -30.233) 2.5 10 (146.04, -37.423) 2.2 11 (138.741, -31.633) 2.1 12 (117.711, -30.286) 2.4 13 (138.559, -31.213) 2.0 14 (117.742, -30.237) 2.3 15 (116.64, -33.39) 2.6 |
> class(earthquakes) |
[1] "SpatialPointsDataFrame" attr(,"package") [1] "sp" |
> str(earthquakes) |
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots ..@ data :'data.frame': 15 obs. of 2 variables: .. ..$ magnitude: num [1:15] 3.3 3.1 2.5 2.6 5.1 3.3 2.3 3.5 2.5 2.2 ... .. ..$ location : Factor w/ 11 levels "E of Wilpena Pound SA",..: 7 2 5 4 6 3 5 8 5 11 ... ..@ coords.nrs : int [1:2] 2 1 ..@ coords : num [1:15, 1:2] 146 146 118 118 121 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:2] "longitude" "latitude" ..@ bbox : num [1:2, 1:2] 116.6 -38.4 146 -16.3 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:2] "longitude" "latitude" .. .. ..$ : chr [1:2] "min" "max" ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots .. .. ..@ projargs: chr NA |
We also need some region data, or some other overlay, so we can map the points to the regions. Using freely available data we might use the Australian states data.
Now we can use overlay to find within which state each earthquake belongs to.
> proj4string(earthquakes) <- CRS("+proj=longlat +datum=WGS84") > overlay(aus, earthquakes)$ADMIN_NAME |
[1] Victoria Victoria Western Australia [4] Western Australia <NA> South Australia [7] Western Australia South Australia Western Australia [10] Victoria South Australia Western Australia [13] South Australia Western Australia Western Australia 8 Levels: Australian Capital Territory ... Western Australia |
For reassurance we can visually match the names in the original data above to the list of states here. The missing value turns out to lay outside the state, in the ocean.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.