DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Cleaning the Survey Dataset |
We summarise a number of cleaning operations that might be performed on the survey dataset.
Remove entities with null values:
> load("survey.RData") > survey <- na.omit(survey) > dim(survey) [1] 30162 15 |
> load("survey.RData") > rmcols <- rev(seq(1,ncol(survey))[as.logical(lapply(survey, is.factor))]) > for (i in rmcols) survey[[i]] <- NULL > dim(survey) [1] 32561 6 > colnames(survey) [1] "Age" "fnlwgt" "Education.Num" "Capital.Gain" [5] "Capital.Loss" "Hours.Per.Week" |