DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Basic Measures |
True positives (TPs) are those records which are correctly classified by a model as positive instances of the concept being modelled (e.g., the model identifies them as a case of fraud, and they indeed are a case of fraud). False positives (FPs) are classified as positive instances by the model, but in fact are known not to be. Similarly, true negatives (TNs) are those records correctly classified by the model as not being instances of the concept, and false negatives (FNs) are classified as not being instances, but are in fact know to be. These are the basic measures of the performance of a model. These basic measures are often presented in the form of a confusion matrix, produced using a contingency table.
In the following example a simple decision tree model, using rpart, is built using the survey dataset to predict Salary.Group. The model is then applied to the full dataset using predict, to predict the class of each observation (using the Roption[]type option to specify class rather than the default probabilities for each class). A confusion matrix is then constructed using table to build a contingency table. Note the use of named parameters (Actual and Predicted) to have these names appear in the table.
> load("survey.RData") > survey.rp <- rpart(Salary.Group ~ ., data=survey) > survey.pred <- predict(survey.rp, data=survey, type="class") > head(survey.pred) [1] <=50K >50K <=50K <=50K >50K >50K Levels: <=50K >50K > table(Actual=survey$Salary.Group, Predicted=survey.pred) Predicted Actual <=50K >50K <=50K 23473 1247 >50K 3816 4025 |
Rather than the raw numbers we usually prefer to express these in
terms of percentages or rates. The accuracy of a model can, for
example, be calculated as the number of entities correctly classified
over the total number of entities classified:
The recall or true positive rate is the proportion of positive
entities which are classified as positive by the model:
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.