|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Cleaning the Survey Dataset |
We summarise a number of cleaning operations that might be performed on the survey dataset.
Remove entities with null values:
> load("survey.RData")
> survey <- na.omit(survey)
> dim(survey)
[1] 30162 15
|
> load("survey.RData")
> rmcols <- rev(seq(1,ncol(survey))[as.logical(lapply(survey, is.factor))])
> for (i in rmcols) survey[[i]] <- NULL
> dim(survey)
[1] 32561 6
> colnames(survey)
[1] "Age" "fnlwgt" "Education.Num" "Capital.Gain"
[5] "Capital.Loss" "Hours.Per.Week"
|