The Data Science Desktop Survival Guide (R Edition) provides a one page per concept style guide to navigating your way around the world of data science using Free (Libre) and Open Source Software. The book is continually being updated and the recipes presented verified. Feedback is always welcome by emailing email@example.com.
To support its ongoing development and to defray the costs of hosting the material, financial donations are welcome. With a donation you will get access to an electronic PDF version. Donations of $40 (or multiples of $40) can be made using PayPal (including credit card payments). Thank you for your interest.
The Essentials of Data Science web site has additional material.
The GNU/Linux Desktop Survival Guide is also available from Togaware.
The material below is archival whilst it migrates to the Survival Guide. It weaves together a collection of documents that introduce tools for the data scientist—tools that are all part of the R Statistical Software Suite.
Each module is a collection of one page sections that cover particular aspects of the topic. The modules aim to be a hands-on guide to a specific task that a new user can work through and then used as a reference guide. Each page aims to be a bite sized chunk for hands-on learning, building on what has gone before. Many modules also have a lecture pack and a laboratory session where a number of tasks can be completed. The R code sitting behind each chapter is also provided and can easily be run standalone to replicate the material presented in the chapter.
The material begins with an overview of how an organisation should go about setting up their Analytics capability and then introduce the Data Scientist to R.
The material here is in various stages of completeness and is always under development! Chapters will change (improve) regularly. All of the material is provided under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License allowing access to everyone for any purpose (except commercial) and is provided at no cost. Refer to the Data Mining Survival Guide or my book on Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R) for related material.
The data used across the chapters is available for download as data.zip.
Data Science Templates
My book, The Essentials of Data Science: Knowledge Discovery Through R, introduces the concept of templates for supporting the data scientist. Below are a variety of templates covering different tasks. The data.R and model.R scripts collect together all of the other component scripts.
Part 1: Data Science
- Data Mining, Analytics, and Data Science: Chapter – R – Lecture
- Rattle to R: Chapter – R
- An Introduction to R Programming: Chapter – R
- Literate Data Science with KnitR: Chapter – R – Lecture
- More Basics of R Chapter – R
Part 2: Dealing With Data
- A Template for Preparing Data: Chapter – R
- Reading Data into R: *Chapter – *R
- Open Access Data via the CKAN API: Chapter – R
- Exploring and Summarising Data: *Chapter – *R
- Visualising Data with GGPlot2: *Chapter – *R
- Transforming Data: *Chapter – *R
- Case Study: Analysis of Sea Ports: Chapter – R
- Case Study: Web Log Analysis: Chapter – R
Part 3: Building Models
- A Template for Building Models: Chapter – R
- Cluster Analysis: Chapter – R – Lecture
- Association Analysis: Chapter – R – Lecture
- Decision Trees: *Lecture – *Chapter – *R – *Rattle
- Ensembles of Decision Trees: *Lecture – *Chapter – *R
- Support Vector Machines
- Neural Networks
- Naive Bayes: Chapter – R
- Multivariate Adaptive Regression Splines: Chapter – R
- Evaluating Models: *Chapter – *R
- Scoring (R)
- PMML (R) Exporting Models for Deployment
Part 4: Advanced R and Analytics
- Strings: Chapter, R
- Dates and Time: *Chapter – *R
- Spatial Data *Chapter – *R
- Big Data *Chapter – *R
- Exploring Different Plots: Chapter – R
- Writing Functions: Chapter – R
- Parallel Processing: Chapter – R
- Environments: *Chapter – R
- Text Mining: *Chapter – *R – Corpus as tar.gz or zip
- Social Network Analysis: Chapter – R
- Genetic Programming: Chapter – R
- Time Series Analysis: Chapter – R
Part 5: Appendicies
Other great resources for modular approaches to learning R include:
Other Togaware resources:
- Open Source Machine Learning with R – FOSSAsia 2017 – (PDF)
- Introducing Data Mining — Lecture
- Ensembles in the ATO – October 2014
- International Centre for Free and Open Source Software – May 2015 (PDF)
- CUNY NSF Workshop – March 2014 (PDF)
- AusDM-2013 Tutorial – November 2013
- IDEAL-2013 Tutorial – October 2013
Other resources include:
Local package archive:
install.packages("rattle", repos="http://rattle.togaware.com", type="source") install.packages("wsrf", repos="http://rattle.togaware.com", type="source") install.packages("wsrpart", repos="http://rattle.togaware.com", type="source") install.packages("wskm", repos="http://rattle.togaware.com", type="source")