Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

My blood recently turned from green to blue (after the Hortonworks-Cloudera merger) and I couldn't be more excited to play with new toys. What I am particularly excited about is Cloudera Data Science Workbench. But, like in everything I do, I am very lazy. So here is a quick tutorial to install Altus Director, and use it to deploy a CDH 5.15 + CDSW cluster.

Step 1: Install Altus Director

Many ways to do that, but the one I chose was the AWS install, detailed here: https://www.cloudera.com/documentation/director/latest/topics/director_aws_setup_client.html

The installation documentation is very well done, but here are the important excerpts

Create a VPC for your Altus instance

Follow the documentation.

Few important points:

  • In the name of laziness I also recommend to add a 0-65535 rule from your personal IP.
  • Your VPC should have an internet gateway associated with it (you could do it without, but would require you manually pulling the CM/CDH software down and make internal repositories within your subnet)
  • Do not forget to open all traffic to your security group as described here. Your deployment will not work otherwise.

Launch a Redhat 7.3 instance

You can either search communities AMIs, or use this one: ami-6871a115

Install Altus

Connect to your ec2 instance:

ssh -i your_file.pem ec2-user@your_instance_ip

Install JDK and wget

sudo yum install java-1.8.0-openjdk
sudo yum install wget

Install/Start Altus server and client:

cd /etc/yum.repos.d/
sudo wget "http://archive.cloudera.com/director6/6.1/redhat7/cloudera-director.repo"
sudo yum install cloudera-director-server cloudera-director-client
sudo service cloudera-director-server start
sudo systemctl disable firewalld
sudo systemctl stop firewalld

Connect to Altus Director

Go to http://your_instance_ip:7189/ and connect with admin/admin

Step 2: Modify the Director configuration file

CDSW cluster configuration can be found here https://github.com/cloudera/director-scripts/blob/master/configs/aws.cdsw.conf

Modify the configuration file to use:

  • Your AWS accessKeyId/secretAccessKey
  • Your AWS region
  • Your AWS subnetId (same as the one you created for your Director instance)
  • Your AWS securityGroupsIds (same as the one you created for your Director instance)
  • Your private key path (e.g. /home/ec2-user/field.pem)
  • Your AWS image (e.g. ami-6871a115)

Step 3: Launch the cluster via director client

Go to your EC2 instance where Director is installed, and load your modified configuration file as well as the appropriate key.

Finally, run the following:

cloudera-director bootstrap-remote your_configuration_file.conf \
--lp.remote.username=admin \
--lp.remote.password=admin

Step 4: Access Cloudera Manager

You can follow the bootstrapping of the cluster both on command line or in the Director interface; once done, you can connect to Cloudera Manager using: http://your_manager_instance_ip:7180/

Step 5: Configure CDSW domain with your IP

Cloudera Data Science Workbench uses DNS. The correct approach is to setup a wildcard DNS record is required, as described here.

However, for testing purposes I used nip.io. The only parameter to change is the Cloudera Data Science Workbench Domain, from cdsw.my-domain.com as the conf file sets it up to, to cdsw.[YOUR_AWS_PUBLIC_IP].nip.io, as depicted below:

Restart the CDSW service, then you should be able to access CDSW by clicking on the CDSW Web UI link. Register for a new account and you will have access to CDSW:

1,954 Views
0 Kudos
Comments

The above was originally posted in the Community Help track. On Sun May 19 16:49 UTC 2019, the HCC moderation staff moved it to the Cloud & Operations Track. The Community Help Track is intended for questions about using the HCC site itself.