A Survival Guide to Data Science with R

These draft chapters weave together a collection of tools for the data scientist—tools that are all part of the R Statistical Software Suite.

Each chapter is a  collection of one (or more) pages that cover particular aspects of the topic. The chapters can be worked through as a hands-on guide to a specific task and then used as a reference guide. Each page aims to be a bite sized chunk for hands-on learning, building on what has gone before. Many chapters also have a lecture pack and a laboratory session where a number of tasks can be completed. The R code sitting behind each chapter is also provided and can be easily run standalone to replicate the material presented in the chapter.

The  material begins with an overview of how an organization should go about setting up their Analytics capability and then introduce the Data Scientist to R.

The material here is in various stages of completeness and is always under development! Chapters will change (improve) regularly. All of the material is provided under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License allowing access to everyone for any purpose (except commercial) and is provided at no cost. Refer to the Data Mining Survival Guide or my book on Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R) for related material.

The data used across the chapters is available for download as



Part 1: Data Science
  1. Data Mining, Analytics, and Data Science: ChapterRLecture
  2. Rattle to R: ChapterR
  3. Literate Data Science with KnitR: ChapterRLecture
  4. A Template for Preparing Data: ChapterR
  5. A Template for Building Models: ChapterR
  6. Case Studies: ChapterR
Part 2: R Programming
  1. Doing R with Style: ChapterR
  2. The Basics of R ChapterR

Part 3: Dealing With Data

  1. Reading Data into R: *Chapter – *R
  2. Exploring and Summarising Data: *Chapter – *R
  3. Visualising Data with GGPlot2: *Chapter – *R
  4. Transforming Data: *Chapter – *R

Part 4: Descriptive Analytics

  1. Cluster Analysis: ChapterRLecture
  2. Association Analysis: ChapterRLecture

Part 5: Predictive Analytics

  1. Decision Trees: *Lecture – *Chapter – *R – *Rattle
  2. Ensembles of Decision Trees: *Lecture – *Chapter – *R
  3. Support Vector Machines
  4. Neural Networks
  5. Naive Bayes: ChapterR
  6. Multivariate Adaptive Regression Splines: ChapterR
  7. Evaluating Models: *Chapter – *R
  8. Scoring (R)
  9. PMML (R) Exporting Models for Deployment

Part 6: Advanced Analytics

  1. Text Mining: *Chapter – *R – Corpus as tar.gz or zip
  2. Social Network Analysis: Chapter – R
  3. Genetic Programming: Chapter – R

Part 7: Advanced R

  1. Strings: Chapter, R
  2. Dates and Time: *Chapter – *R
  3. Spatial Data *Chapter – *R
  4. Big Data *Chapter – *R
  5. Exploring Different Plots: ChapterR
  6. Writing Functions: ChapterR
  7. Parallel Processing: ChapterR
  8. Environments: *ChapterR

Part 8: Expert R

  1. Packaging (R) Pulling it Together into a Package

Other great resources for modular approaches to learning R include:

Other Togaware resources:

Local package archive:

install.packages("rattle", repos="", type="source")
install.packages("wsrf", repos="", type="source")
install.packages("wsrpart", repos="", type="source")
install.packages("wskm", repos="", type="source")

Creative Commons License

Leave a Reply