DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Nearest Neighbours |
We might, more reasonably, be more sophisticated and use the average value of the nearest neighbours, where the neighbours are determined by looking at the other variables (not yet implemented in Rattle).
Another approach to filling in the missing values is to look at the entities that are closest to the observation with a missing value, and to use the values for the missing variable of these nearby neighbours to fill in the missing value for this observation. Refer to Data Mining With R, page 48 and following for example R code to do this.
Nearest neighbour models tend to be at the opposite end of the scale of bias and variance to linear regression. Models have a low bias but high variance.