The Data Science Desktop Survival Guide (R Edition) provides a one page per concept style guide to navigating your way around the world of data science using Free (Libre)  and Open Source Software. The book is continually being updated and the recipes presented verified. Feedback is always welcome by emailing survivor@togaware.com.

To encourage its ongoing development and to defray the costs of supporting and hosting the material, financial donations are welcome. With a donation you will also get access to an electronic PDF version. Donations of $40 (or multiples of $40) can be made using PayPal (including credit card payments). Thank you for your interest.

 

The Essentials of Data Science web site has additional material.

The GNU/Linux Desktop Survival Guide is also available from Togaware.

The material below is archival whilst it migrates to the Survival Guide. It weaves together a collection of documents that introduce tools for the data scientist—tools that are all part of the R Statistical Software Suite.

Each module is a collection of one page sections that cover particular aspects of the topic. The modules aim to be a hands-on guide to a specific task that a new user can work through and then used as a reference guide. Each page aims to be a bite sized chunk for hands-on learning, building on what has gone before. Many modules also have a lecture pack and a laboratory session where a number of tasks can be completed. The R code sitting behind each chapter is also provided and can easily be run standalone to replicate the material presented in the chapter.

The material begins with an overview of how an organisation should go about setting up their Analytics capability and then introduce the Data Scientist to R.

The material here is in various stages of completeness and is always under development! Chapters will change (improve) regularly. All of the material is provided under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License allowing access to everyone for any purpose (except commercial) and is provided at no cost. Refer to the Data Mining Survival Guide or my book on Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R) for related material.

Enjoy!

The data used across the chapters is available for download as data.zip.

Data Science Templates

My book, The Essentials of Data Science: Knowledge Discovery Through R, introduces the concept of templates for supporting the data scientist. Below are a variety of templates covering different tasks. The data.R and model.R scripts collect together all of the other component scripts.

DataClassificationRegression
data.R
00_setup.R
10_ingest.R
20_observe.R
30_prepare.R
40_meta.R
50_save.R
model.R
60_model.R
62_rpart.R
64_randomForest.R
66_xgboost.R
68_dnn.R
70_model.R
72_lm.R
74_rpart
76_dnn.R

Part 1: Data Science

  1. Data Mining, Analytics, and Data Science: ChapterRLecture
  2. Rattle to R: ChapterR
  3. An Introduction to R Programming: Chapter – R
  4. Literate Data Science with KnitR: ChapterRLecture
  5. More Basics of R ChapterR

Part 2: Dealing With Data

  1. A Template for Preparing Data: ChapterR
  2. Reading Data into R: *Chapter – *R
  3. Open Access Data via the CKAN API: ChapterR
  4. Exploring and Summarising Data: *Chapter – *R
  5. Visualising Data with GGPlot2: *Chapter – *R
  6. Transforming Data: *Chapter – *R
  7. Case Study: Analysis of Sea Ports: ChapterR
  8. Case Study: Web Log Analysis: Chapter – R

Part 3: Building Models

  1. A Template for Building Models: ChapterR
  2. Cluster Analysis: ChapterRLecture
  3. Association Analysis: ChapterR – Lecture
  4. Decision Trees: *Lecture – *Chapter – *R – *Rattle
  5. Ensembles of Decision Trees: *Lecture – *Chapter – *R
  6. Support Vector Machines
  7. Neural Networks
  8. Naive Bayes: ChapterR
  9. Multivariate Adaptive Regression Splines: ChapterR
  10. Evaluating Models: *Chapter – *R
  11. Scoring (R)
  12. PMML (R) Exporting Models for Deployment

Part 4: Advanced R and Analytics

  1. Strings: Chapter, R
  2. Dates and Time: *Chapter – *R
  3. Spatial Data *Chapter – *R
  4. Big Data *Chapter – *R
  5. Exploring Different Plots: ChapterR
  6. Writing Functions: ChapterR
  7. Parallel Processing: ChapterR
  8. Environments: *ChapterR
  9. Text Mining: *Chapter – *R – Corpus as tar.gz or zip
  10. Social Network Analysis: Chapter – R
  11. Genetic Programming: Chapter – R
  12. Time Series Analysis: Chapter – R

Part 5: Appendicies

  1. Doing R with Style: ChapterR
  2. Packaging (R) Pulling it Together into a Package: Chapter

Other great resources for modular approaches to learning R include:


Other Togaware resources:


Other resources include:


Local package archive:

install.packages("rattle", repos="http://rattle.togaware.com", type="source")
install.packages("wsrf", repos="http://rattle.togaware.com", type="source")
install.packages("wsrpart", repos="http://rattle.togaware.com", type="source")
install.packages("wskm", repos="http://rattle.togaware.com", type="source")
Creative Commons License