For our Machine Learning in R tutorial each participant is requested to install or obtain access to the free (as in libre) and open source software R and Rattle. Please complete this prior to the session itself or else let me know of any issues you have installing. An Azure (cloud) Data Science Virtual Machine running Ubuntu and a full suite of open source software for the data scientist will be available. The only setup required to use this setup is installing an application (X2Go) on your own computer to connect to the remote desktop. To run R and Rattle on your own computer you can install the open source Docker software and within a container running on your own computer you can run an image of Ubuntu already installed and configured with R and Rattle. This requires minimal setup and ensures everyone has the same experience. Finally you can install R and Rattle on your own computer together with a collection of over 200 support packages that are used through Rattle. This requires an hour or so downloading the required software packages.

We will describe each scenario below. Prior to the tutorial session it is requested that all participants aim to have a standalone and self contained environment running the pre-built Docker image. You are also requested to install the X2Go client locally (which requires an Internet connection during the tutorial to be useful). For a complete experience you can also install R locally prior to the tutorial session. This will ensure a smoother ramp up at the tutorial itself.

For each scenario on Mac OS X, you will also need to install XQuartz to display the Rattle Graphical User Interface.

WiFi is available for free through the Wireless@SG network.

Azure Data Science Virtual Machine

This is perhaps the simplest approach requiring only that you install the X2Go client on your own machine, under MS/Windows, Apple/OS X or GNU/Linux. It is also how many data scientists today work, allowing the cost-effective utilisation of cloud based servers of any size as required. We do require an Internet connection during the tutorial session for this approach to be useful and this can sometimes be problematic if relying on externally provided WiFi. Participants will receive a username and password to connect to a Ubuntu based Data Science Virtual Machine running on Azure. You will also be provided with the host name of the server to which to connect. All participants will then use the same configured environment and no further setup is required on your part. Following the tutorial session you can sign up for a free trial subscription to Azure (or use your own company’s subscription) to deploy your own data science virtual machine in the cloud.

The steps are: Install X2Go client; Fire up X2Go and configure it with the host name and user name to connect to the Data Science Virtual Machine desktop, choosing XFCE for the desktop type.

Docker Image

Docker is a lightweight alternative to a virtual machine with many of the same advantages. It can readily be installed on your own computer and this will then allow you to download an already configured image of the Ubuntu server with R and Rattle already installed. You can then run this image within a protected container on your own computer without any ongoing need for an Internet connection. Installation and deployment of the image is straight forward and described on Docker hub.

The steps are: Install Docker, download the Rattle image from the Docker Hub, run the Rattle image in a container.

 Local Install

This is the trickiest as everyone’s environment is different and the install can sometimes be problematic. It has the advantage that you then run R and Rattle locally in your computer’s own environment and do not require an Internet connection once installed. You begin by installing R. Then start up R and install Rattle:

$ R
> install.packages("rattle", dependencies=c("Depends", "Imports",  "Suggests"))

Further instructions are available from Togaware.

Getting Started with Rattle

For any of the above you will, once you have the software installed and a connection to the appropriate server/image/machine, start up R and then load the Rattle software.

$ R
> library(rattle)
> rattle()

A GUI should popup. Click on Execute, then on OK on the “load weather dataset” dialogue, then the Model tab, then Execute. You will have built your first machine learning model. Click on Draw to visualise the model.