DATA MINING
Desktop Survival Guide
by Graham Williams

Support Vector Machine

Todo: ABS Example of fitting linear model but no kernel for job category classification from a text description.

A Support Vector Machine (SVM) searches for so called support vectors which are data points that are found to lie at the edge of an area in space which is a boundary from one class of points to another. In the terminology of SVM we talk about the space between regions containing data points in different classes as being the margin between those classes. The support vectors are used to identify a hyperplane (when we are talking about many dimensions in the data, or a line if we were talking about only two dimensional data) that separates the classes. Essentially, the maximum margin between the separable classes is found. An advantage of the method is that the modelling only deals with these support vectors, rather than the whole training dataset, and so the size of the training set is not usually an issue. If the data is not linearly separable, then kernels are used to map the data into higher dimensions so that the classes are linearly separable. Also, Support Vector Machines have been found to perform well on problems that are non-linear, sparse, and high dimensional. A disadvantage is that the algorithm is sensitive to the choice of variable settings, making it harder to use, and time consuming to identify the best.

Support vector machines do not predict probabilities but rather produce normalised distances to the decision boundary. A common approach to transforming these decisions to probabilities is by passing them through a sigmoid function.

Rattle supports the building of support vector machine (SVM) models using the kernlab package for R. This package provides an extensive collection of kernel functions, and a variety of tuning options. The trick with support vector machines is to use the right combination of kernel function and kernel parameters--and this can be quite tricky. Some experimentation with the audit dataset, exploring different kernel functions and parameter settings, identified that a polynomial kernel function with the class weights set to c("0"=4, "1"=1) resulted in the best risk chart. For a 50% caseload we are recovering 94% of the adjustments and 96% of the revenue.

There are other general variables that can be set for ksvm, including the cost of constraint violation (the trade-off between the training error and margin? could be from 1 to 1000, and is 1 by default).

For best results it is often a good idea to scale the numeric variables to have a mean of 0 and a standard deviation of 1.

For polynominal kernels you can choose the degree of the polynomial. The default is 1. For the audit data as you increase the degree the model gets much more accurate on the training data but the model generalises less well, as exhibited by its performance on the test dataset.

Another variable often needing setting for a radial basis function kernel is the sigma value. Rattle uses automatic sigma estimation (sigest) for this kernel, to find the best sigma, and so the user need not set a sigma value. If we wanted to experiment with various sigma values we can copy the R code from the Log tab and paste it into the R console, add in the additional settings, and run the model. Assuming the results are assigned into the variable crs$ksvm, as in the Log, we can the evaluate the perfromance of this new model using the Evaluate tab.

Subsections

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010