Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Cloudera Employee

Short Description:

In this tutorial we will install the browser connected version of RServer on to the HDP Sandbox.

Article

Installing RStudio on HDP Sandbox

Introduction

RStudio is an Integrated Development Environment (IDE) for the R language which includes a direct code execution console, as well as tools for plotting and debugging, you can find more information about the RStudio features here.

RStudio is used as the primary tool in the Predicting Airline Delays using SparkR Hortonworks tutorial in which you will learn to train and analyze Machine Learning models to predict Airline delays.

Prerequisites

1. SSH on to the Sandbox

Use the following command to SSH on to the Sandbox as root user:

ssh root@sandbox-hdp.hortonworks.com -p 2222

NOTE: If this is your first time signing on the default password is hadoop.


2. Begin installation

On CentOS 7, the base OS for the Sandbox, R is available through the Extra Packages for Enterprise Linux (EPEL) package, so we will install it first.

yum install epel-release

Next, update yum

yum update -y

3. Install R and RStudio

Let us begin by installing R:

yum install R -y

Now we may install RStudio Server:

wget https://download2.rstudio.org/rstudio-server-rhel-1.1.456-x86_64.rpm 
sudo yum install rstudio-server-rhel-1.1.456-x86_64.rpm

NOTE: You can find the newest RStudio release here under Redhat/CentOS 64bit.

Finally verify that the server is up and running:

systemctl status rstudio-server.service

You should see a message stating that the server is active.

4. Assigning a Different Port for RStudio

Install dpkg to divert the location of /sbin/initctl and assign a different port for Rstudio:

yum install -y dpkg
dpkg-divert --local --rename --add /sbin/initctl
ln -s /bin/true /sbin/initctl

By default RStudio accepts connections on port 8787; however, the Sandbox uses this port for another service, so we must assign the server a different port (In our case we will use port 60000).

echo "www-port=60000" | sudo tee -a /etc/rstudio/rserver.conf

The next command will restart the server:

NOTE: The command will end the SSH connection to the Sandbox, do not panic, this is expected.

exec /usr/lib/rstudio-server/bin/rserver

5. Begin using RStudio

Open a web browser and navigate to:

http://sandbox-hdp.hortonworks.com:60000

You should see a Sign in Screen for RStudio:

Your Username is amy_ds and the password is amy_ds.


Congratulations! You may now start using RStudio along with the tools included in the HDP Sandbox for an enhanced Data Science experience.

Summary

In this tutorial we learned how to install RStudio and change the configuration file for the server to change the default RStudio port to avoid conflicts on our sandbox.

Further Reading

You can go to following links to explore tutorials using RStudio:

1,861 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎07-18-2018 05:58 PM
Updated by:
 
Contributors
Top Kudoed Authors