DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Record Linkage |
Often data linkage is not so straightforward as linking on common columns. Indeed, the data sources may store data in very different ways and the linking may need to probabilistically match entries that appear to relate to the same entity. This is typified by attempting to match names and addresses from different data sources. The entities we are attempting to match could be businesses, patients, and clients.
A very useful tool to help out in this process is the open source Febrl.