{"id":515,"date":"2017-05-28T12:37:46","date_gmt":"2017-05-28T02:37:46","guid":{"rendered":"http:\/\/togaware.com\/?p=515"},"modified":"2021-03-05T16:57:06","modified_gmt":"2021-03-05T05:57:06","slug":"running-an-r-workshop-on-azure-linux-data-science-virtual-machine","status":"publish","type":"post","link":"https:\/\/togaware.com\/running-an-r-workshop-on-azure-linux-data-science-virtual-machine\/","title":{"rendered":"Running an R Workshop on Azure with the Ubuntu Data Science Virtual Machine"},"content":{"rendered":"

The fully open source software stack of the Ubuntu Data Science Virtual Machine<\/a> (DSVM) hosted on Azure is a great place to support an R workshop or laboratory session or R training. I record\u00a0 here the simple steps to set up a Linux Data Science Virtual Machine (in the main so I can remember how to do it each time).\u00a0 Workshop attendees will have their own laptop computers and can certainly install R themselves but with the Ubuntu Data Science Virtual Machine we have a shared and uniformly configured platform which avoids the traditional idiosyncrasies and frustrations that plague a large class installing software on multiple platforms themselves. Instead of speding the first trouble filled hour of a class setting up everyone’s computer we can use a local browser to access either Jupyter Notebooks<\/a> or RStudio<\/a> Server running on the DSVM.<\/p>\n

Jupyter Notebooks on JupyterHub<\/strong><\/p>\n

We illustrate the session with both Jupyter<\/a> Notebook supporting multiple users under JupyterHub<\/a> and as a backup running RStudio<\/a> Server (for those environments where a secure connection through https is not permitted). Both can be accessed via browsers. JupyterHub uses https (encrypted) which may be blocked by firewalls within organisations. In that case an RStudio<\/a> Server over http is presented as a backup.<\/p>\n

WARNING:<\/strong> Jupyter Notebook has been able to render my laptop computer (under both Linux and Windows, Firefox and IE) unusable after a period of extensive usage when the browser freezes and the machine becomes completely unresponsive.<\/p>\n

Jupyter Notebook<\/a> provides a browser interface with basic literate programming<\/a> capability. I’ve been a fan of literate programming since my early days as a programmer in the 1980’s when I first came across the concept from Donald Knuth. I now encourage literate data science and it is a delight to see others engaged is urging this approach to data science. Jupyter Notebooks are great for self paced learning intermixing a narrative with actual R code. The R code can be executed in place with results displayed in place as the student works through the material. Jupyter Notebooks are not such a great development environment though. Other environments excel there.<\/p>\n

JupyterHub<\/a> supports multiple users on the one platform, each with their own R\/Jupyter process. The Linux Data Science Virtual Machine running on Azure provides these open source environments out of the box.\u00a0 Access to JupyterHub is through port 8000.<\/p>\n

Getting Started – Create a Ubuntu Data Science Virtual Machine<\/strong><\/p>\n

To begin we need to deploy a Ubuntu Data Science Virtual Machine. See the first two steps on my blog post<\/a>. A DS14 server (or D14_V2 for a SSD based server) having 16 cores and 112 GB of RAM seems a good size (about $40 per day).<\/p>\n

We may want to add a disk for user home folders as they can sometimes get quite large during training. To do so follow the Azure<\/a> instructions:<\/p>\n

    \n
  1. In the Portal click in the virtual machine.<\/li>\n
  2. Click on Disks and Attach New.<\/li>\n
  3. Choose the Size. 1000GB is probably okay for a class of 100.<\/li>\n
  4. Click OK (takes about 2 minutes).<\/li>\n
  5. Now log in to the server through ssh:\n
    ssh xyz@dsvmxyz01.southeastasia.cloudapp.azure.com<\/pre>\n<\/li>\n
  6. The disk is visible as \/dev\/sdd\n