Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Priors (prior)

Sometimes the proportions of classes in a training set do not reflect their true proportions in the population. You can inform Rattle of the population proportions, and the resulting model will reflect these.

The so-called priors can also be used to ``boost'' a particularly important class, by giving it a higher prior probability, although this might best be done through the Loss Matrix.

In Rattle the priors are expressed as a list of numbers that sum up to 1. The list must be of the same length as the number of unique classes in the training dataset. An example for binary classification is 0.5,0.5.

The default priors are set to be the class proprtions as found in the training dataset.

Using rpart directly we specify Roption[]prior within an option called Roption[]parms:



> set.seed(42)
> wa.train <- sample(nrow(weatherAUS), 0.5*nrow(weatherAUS))
> wa.rpart <- rpart(RainTomorrow ~ RainToday, data=weatherAUS[wa.train,])
> wa.rpart



n=14049 (360 observations deleted due to missingness)

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 14049 3116 No (0.7782049 0.2217951) *



> table(predict(wa.rpart, weatherAUS[-wa.train,], type="class"), 
        weatherAUS[-wa.train, "RainTomorrow"])



         No   Yes
  No  11090  3081
  Yes     0     0



> wa.rpart <- rpart(RainTomorrow ~ RainToday, data=weatherAUS[wa.train,], 
                   parm=list(prior=c(0.5, 0.5)))
> wa.rpart



n=14049 (360 observations deleted due to missingness)

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 14049 7024.500 No (0.5000000 0.5000000)  
  2) RainToday=No 10972 3654.273 No (0.6218021 0.3781979) *
  3) RainToday=Yes 3077 1016.442 Yes (0.2317116 0.7682884) *



> table(predict(wa.rpart, weatherAUS[-wa.train,], type="class"), 
        weatherAUS[-wa.train, "RainTomorrow"])



        No  Yes
  No  9460 1632
  Yes 1630 1449



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010