DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
New ideas are often most effectively understood and appreciated by actually doing something with them. And so it is with data mining. Fundamentally, data mining is about practical application--application of the algorithms developed by researchers in Artificial Intelligence, Machine Learning, Computer Science and Statistics. This chapter is about getting started with data mining--hands-on.
Our aim throughout this book is to provide hands-on practice in data mining. For this we need some good tools, and, ideally, tools that are freely available to everyone and can be freely modified by anyone (known as open source software). For our purposes we use the open source and free data mining tool, Rattle, which is built on the open source and free statistical software environment R. It is available for download from http://rattle.togaware.com.
We can, quite quickly, build our first data mining model, with Rattle's support. Be careful though--there is a lot of effort required in getting our data into shape. Once we have data, Rattle can build a model with just four mouse clicks.
In this chapter we use the Rattle GUI (Graphical User Interface) to build our first data mining model--a simple decision tree model which is one of the most common models in data mining. We will also work through the key elements of the Rattle user interface. We continue with some background about what Rattle is, then review the overall concept of data mining, illustrated with Rattle through each of its major functional components.
Before proceeding we do need to download and install R and the requisite packages. This is covered in detail in Appendix A where the instructions for GNU/Linux, MS/Windows, and Mac/OSX can be found. Now is a good time to install R. Much of what follows for the rest of the book, and specifically this chapter, relies on interacting with R and Rattle.