Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Record Linkage

Often data linkage is not so straightforward as linking on common columns. Indeed, the data sources may store data in very different ways and the linking may need to probabilistically match entries that appear to relate to the same entity. This is typified by attempting to match names and addresses from different data sources. The entities we are attempting to match could be businesses, patients, and clients.

A very useful tool to help out in this process is the open source Febrl.



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010