{"id":515,"date":"2017-05-28T12:37:46","date_gmt":"2017-05-28T02:37:46","guid":{"rendered":"http:\/\/togaware.com\/?p=515"},"modified":"2021-03-05T16:57:06","modified_gmt":"2021-03-05T05:57:06","slug":"running-an-r-workshop-on-azure-linux-data-science-virtual-machine","status":"publish","type":"post","link":"https:\/\/togaware.com\/running-an-r-workshop-on-azure-linux-data-science-virtual-machine\/","title":{"rendered":"Running an R Workshop on Azure with the Ubuntu Data Science Virtual Machine"},"content":{"rendered":"
The fully open source software stack of the Ubuntu Data Science Virtual Machine<\/a> (DSVM) hosted on Azure is a great place to support an R workshop or laboratory session or R training. I record\u00a0 here the simple steps to set up a Linux Data Science Virtual Machine (in the main so I can remember how to do it each time).\u00a0 Workshop attendees will have their own laptop computers and can certainly install R themselves but with the Ubuntu Data Science Virtual Machine we have a shared and uniformly configured platform which avoids the traditional idiosyncrasies and frustrations that plague a large class installing software on multiple platforms themselves. Instead of speding the first trouble filled hour of a class setting up everyone’s computer we can use a local browser to access either Jupyter Notebooks<\/a> or RStudio<\/a> Server running on the DSVM.<\/p>\n Jupyter Notebooks on JupyterHub<\/strong><\/p>\n We illustrate the session with both Jupyter<\/a> Notebook supporting multiple users under JupyterHub<\/a> and as a backup running RStudio<\/a> Server (for those environments where a secure connection through https is not permitted). Both can be accessed via browsers. JupyterHub uses https (encrypted) which may be blocked by firewalls within organisations. In that case an RStudio<\/a> Server over http is presented as a backup.<\/p>\n WARNING:<\/strong> Jupyter Notebook has been able to render my laptop computer (under both Linux and Windows, Firefox and IE) unusable after a period of extensive usage when the browser freezes and the machine becomes completely unresponsive.<\/p>\n Jupyter Notebook<\/a> provides a browser interface with basic literate programming<\/a> capability. I’ve been a fan of literate programming since my early days as a programmer in the 1980’s when I first came across the concept from Donald Knuth. I now encourage literate data science and it is a delight to see others engaged is urging this approach to data science. Jupyter Notebooks are great for self paced learning intermixing a narrative with actual R code. The R code can be executed in place with results displayed in place as the student works through the material. Jupyter Notebooks are not such a great development environment though. Other environments excel there.<\/p>\n JupyterHub<\/a> supports multiple users on the one platform, each with their own R\/Jupyter process. The Linux Data Science Virtual Machine running on Azure provides these open source environments out of the box.\u00a0 Access to JupyterHub is through port 8000.<\/p>\n Getting Started – Create a Ubuntu Data Science Virtual Machine<\/strong><\/p>\n To begin we need to deploy a Ubuntu Data Science Virtual Machine. See the first two steps on my blog post<\/a>. A DS14 server (or D14_V2 for a SSD based server) having 16 cores and 112 GB of RAM seems a good size (about $40 per day).<\/p>\n We may want to add a disk for user home folders as they can sometimes get quite large during training. To do so follow the Azure<\/a> instructions:<\/p>\n Connecting to JupyterHub<\/strong><\/p>\n If you set up a DNS name label dsvmxyz01<\/em> and the location is southeastasia <\/em>then visit:<\/p>\n https:\/\/dsvmxyz01.southeastasia.cloudapp.azure.com:8000\/<\/a><\/p>\n First time you connect to the site you will be presented with a warning from the browser that the connection is insecure. It is using a self signed certificate to assure the encryption between your browser and the server. That is fine though a little disconcerting. As the user you could simply click through to allow the connection and add an exception. This often involves clicking on Advanced and then Add Exception… and then Confirm Security Exception. It is safe to provide an exception for now. However, best to install a proper certificate!<\/p>\n Install a LetsEncrypt Certificate<\/strong><\/p>\n We can instead install a free Let’s Encrypt certificate from letsencrypt<\/a> to have a valid non-self-signed certificate. To do so we first need to allow connection through the https: port (443) through the Azure portal for the dsvm. Then log on to the server and do the following:<\/p>\n You should be able to connect now without the certificate warning.<\/p>\n You are presented with a Jupyter Hub Sign in page.<\/p>\n <\/a><\/p>\n Creating User Accounts<\/strong><\/p>\n Log in to the server. This will depend on whether you set up a ssh-key or a username and password. We assume the latter for this post. On a terminal (or using Putty on Windows), connect as:<\/p>\n You will be prompted for a password.<\/p>\n We can then create user accounts for each user in our workshop. The user accounts are created on the Linux DSVM. Here we create 40 user accounts and record their random usernames and passwords into the file usersinfo.csv<\/em> on the server:<\/p>\n If the process has issues and you need to start the account creation again then delete the users:<\/p>\n Provide a username\/passwd to each participant of the workshop, one line only to each user. The file will begin something like:<\/p>\n Now go back to https:\/\/dsvmxyz01.southeastasia.cloudapp.azure.com:8000\/<\/a> and Sign in<\/em> with the Username<\/em> userce81 and Password<\/em> d0dfac5a30 (using the username and password from your own usersinfo.csv<\/em> file.)<\/p>\n Once logged in Jupyter will display a file browser.<\/p>\n <\/a><\/p>\n Notice a number of notebooks are available. Click the IntroTurorialInR.ipynb<\/em> for a basic introduction to R.<\/p>\n <\/a><\/p>\n Backup Option – RStudio<\/strong><\/p>\n JupyterHub\u00a0requires https and so won’t\u00a0run\u00a0internally within a customer site if they have\u00a0a\u00a0firewall blocking all SSL (encrypted) communications. In this case RStudio server is a backup option. It is pre-installed on the server and if you followed my instructions above for deploying a DSVM you will hav updated to the latest version too.<\/p>\n Connect to the RStudio server:<\/p>\n http:\/\/dsvmxyz01.southeastasia.cloudapp.azure.com:8787<\/a><\/p>\n Sign in to RStudio<\/em> with the same Username<\/em> and Password<\/em> as above.<\/p>\n <\/a><\/p>\n Running Rattle through an X2Go Desktop<\/strong><\/p>\n If you followed my DSVM deployment guide then you will have also set up X2Go on your local computer to support a desktop connection across to the DSVM. This is very convenient in terms of running desktop apparitions, like Rattle,\u00a0 on the DSVM. Every student in the class gets the same environment.<\/p>\n <\/a><\/p>\n Shortcuts to the Services<\/strong><\/p>\n The URLs are rather long and so we can set up either bit.ly<\/a> or aka.ms<\/a> shortcuts. Visiting the latter we set up two short URLs:<\/p>\n We can now use the short URLs to refer to the long URLs.<\/p>\n REMEMBER<\/strong>: Deploy-Compute-Destroy for a cost effective hardware platform for Data Science. Deallocate (Stop) your server when it is not required.<\/p>\n Graham @ Microsoft<\/p>\n","protected":false},"excerpt":{"rendered":" The fully open source software stack of the Ubuntu Data Science Virtual Machine (DSVM) hosted on Azure is a great place to support an R workshop or laboratory session or R training. I record\u00a0 here the simple steps to set up a Linux Data Science Virtual Machine (in the main so I can remember how […]<\/p>\n","protected":false},"author":2,"featured_media":522,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"image","meta":[],"categories":[48,2,1],"tags":[29,18,35],"_links":{"self":[{"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/posts\/515"}],"collection":[{"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/comments?post=515"}],"version-history":[{"count":42,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/posts\/515\/revisions"}],"predecessor-version":[{"id":1021,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/posts\/515\/revisions\/1021"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/media\/522"}],"wp:attachment":[{"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/media?parent=515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/categories?post=515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/togaware.com\/wp-json\/wp\/v2\/tags?post=515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}\n
ssh xyz@dsvmxyz01.southeastasia.cloudapp.azure.com<\/pre>\n<\/li>\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
UUID=f395b783-31da-4916-a3a9-8fb56fd7a068 \/home ext4 defaults,nofail,discard 1 2<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n
\n
\n
\n
** TO BE UPDATED TO THE EQUIVALENT IN UBUNTU<\/strong> **\r\n$ ssh xyz@dsvmxyz01.southeastasia.cloudapp.azure.com\r\n$ sudo yum install epel-release\r\n$ sudo yum install httpd mod_ssl python-certbot-apache\r\n$ sudo emacs \/etc\/httpd\/conf.d\/ssl.conf\r\n Within the Virtual Host entry add\r\n ServerName xyz.southeastasia.cloudapp.azure.com\r\n #<\/span> SSLProtocol all -SSLv2\r\n #<\/span> SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA\r\n$ sudo systemctl restart httpd\r\n$ sudo systemctl status httpd\r\n$ sudo certbot --apache -d xyz@dsvmxyz01.southeastasia.cloudapp.azure.com\r\n$ sudo systemctl start httpd\r\n<\/code><\/pre>\n
$ ssh xyz@dsvmxyz01.southeastasia.cloudapp.azure.com<\/pre>\n
for i in {1..40};\u00a0do\u00a0\r\n\u00a0 u=`openssl rand -hex 2`\r\n\u00a0 sudo adduser user$u --gecos \"\" --disabled-password\r\n\u00a0 p=`openssl rand -hex 5`\r\n\u00a0 echo \"user$u:$p\" | sudo chpasswd\r\n\u00a0 echo user$u:$p >> 'usersinfo.csv'\r\ndone<\/pre>\n
for i in $(cut -d \":\" -f1 usersinfo.csv); do \r\n sudo deluser --remove-home $i; \r\ndone\r\n\r\n# Check it has been done\r\n\r\ntail \/etc\/passwd\r\nls \/home\/<\/pre>\n
userce81:d0dfac5a30\r\nuserd2ec:a4f142c342\r\nuser6309:0f13aeb27a\r\nuser0774:e334399343<\/pre>\n
https:\/\/aka.ms\/xyz_hub as https:\/\/dsvmxyz01.southeastasia.cloudapp.azure.com:8000\r\nhttp:\/\/aka.ms\/xyz_rstudio as http:\/\/dsvmxyz01.southeastasia.cloudapp.azure.com:878<\/pre>\n