Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar

Automated deployment of a fresh HDP cluster that includes Zeppelin (via blueprints)

Background:

The Zeppelin Ambari service has been updated and now supports installing latest Zeppelin version (0.5.5) on HDP using blueprints to automate creation of 'data science' cluster. SequenceIQ team have a datascientist blueprint that installs Zeppelin, but based on my conversations with @lpapp, its based on an older version of the Ambari service so does not install the latest or support as many options.

Below is a writeup of how to deploy Zeppelin via blueprints. Note that if you already have a cluster running, you should just use the Add service wizard in Ambari to deploy Zeppelin using the steps on the github

Purpose:

  • Sample steps below for installing a 4-node HDP cluster that includes Zeppelin, using Ambari blueprints and Ambari bootstrap scripts (by @Sean Roberts)

Pre-reqs:

Bring up 4 VMs imaged with RHEL/CentOS 6.5 or later (e.g. called node1-4 in this case).

Note that the VMs should not already have HDP related software installed on them at this point.

Steps:

  • On non-ambari nodes (e.g. nodes2-4), use Ambari bootstrap script to run pre-reqs, install ambari-agents and point them to ambari node (e.g. node1 in this case)
export ambari_server=node1
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
  • On Ambari node (e.g. node1), use bootstrap script to run pre-reqs and install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh
yum install -y git
git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/2.3/services/ZEPPELIN
  • Edit the /var/lib/ambari-server/resources/stacks/HDP/2.3/role_command_order.json file to include below:
  "ZEPPELIN_MASTER-START": ["NAMENODE-START", "DATANODE-START"],
  • Note that comma at the end. If you insert the above as the last line, you need to remove the comma
  • Restart Ambari
service ambari-server restart
service ambari-agent restart    
  • Confirm 4 agents were registered and agent remained up
curl -u admin:admin -H  X-Requested-By:ambari http://localhost:8080/api/v1/hosts
service ambari-agent status
  • (Optional) - In general, you can generate a BP and cluster file for your cluster via Ambari recommendations API using these steps. However in this example we are providing some sample blueprints which you can edit, so this is not needed. These for reference only. For more details on the bootstrap scripts see bootstrap script git
yum install -y python-argparse
git clone https://github.com/seanorama/ambari-bootstrap.git

#optional - limit the services for faster deployment

#for minimal services
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE ZEPPELIN"

#for most services
#export ambari_services="ACCUMULO FALCON FLUME HBASE HDFS HIVE KAFKA KNOX MAHOUT OOZIE PIG SLIDER SPARK SQOOP MAPREDUCE2 STORM TEZ YARN ZOOKEEPER ZEPPELIN"

export deploy=false
cd ambari-bootstrap/deploy
bash ./deploy-recommended-cluster.bash

cd tmpdir*

#edit the blueprint to customize as needed. You can use sample blueprints provided below to see how to add the custom services.
vi blueprint.json

#edit cluster file if needed
vi cluster.json
  • Download either minimal or full blueprint for 4 node setup
#Pick one of the below blueprints
#for minimal services download this one
wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-4node... -O blueprint-zeppelin.json

#for most services download this one
wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-4node... -O blueprint-zeppelin.json
  • (optional) If running on single node, download minimal blueprint for 1 node setup
#Pick one of the below blueprints
#for minimal services download this one
wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/blueprint-1node... -O blueprint-zeppelin.json
  • (optional) If needed, change the Zeppelin configs based on your setup by modifying these lines
vi blueprint-zeppelin.json
  • if deploying on public cloud, you will want to add "zeppelin.host.publicname":"<public IP or hostname of zeppelin node>" so the Zeppelin Ambari view is pointing to external hostname (instead of the internal name, which is the default)
  • Upload selected blueprint and download a sample cluster.json that provides your host FQDN's. Modify the host FQDN's in the cluster.json file your own env. Finally deploy cluster and call it zeppelinCluster
#upload the blueprint to Ambari
curl -u admin:admin -H  X-Requested-By:ambari http://localhost:8080/api/v1/blueprints/zeppelinBP -d @blueprint-zeppelin.json
  • download sample cluster.json
#for 4 node setup
wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/cluster-4node.j... -O cluster.json

#for single node setup
wget https://raw.githubusercontent.com/hortonworks-gallery/ambari-zeppelin-service/master/cluster-1node.j... -O cluster.json
  • modify the host FQDNs in the cluster json file with your own. Also change the default_password to set the password for hive
vi cluster.json
  • deploy the cluster
curl -u admin:admin -H  X-Requested-By:ambari http://localhost:8080/api/v1/clusters/zeppelinCluster -d @cluster.json
  • You can monitor the progress of the deployment via Ambari (e.g. http://node1:8080).
  • Once install completes, you will have a 4 node HDP cluster including Zeppelin, along with some starter demo Zeppelin notebooks from the gallery github
  • More details available on the github README here
  • Similar steps are available here to deploy a 'security ready' cluster including demo KDC, OpenLDAP, NSLCD services.

1,403 Views
Comments

@Ali Bajwa A simplified approach: On the Ambari Server:

yum -y install git
git clone https://github.com/seanorama/ambari-bootstrap

cd ambari-bootstrap
export ambari_server_custom_script=${ambari_server_custom_script:-~/ambari-bootstrap/ambari-extras.sh}
export install_ambari_server=true
./ambari-bootstrap.sh
Then deploy the cluster. The "extras" script above takes care of all the tedious stuff automatically (cloning Zeppelin, the blueprint defaults, the role command order, ...).
yum -y install python-argparse
cd deploy
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE SPARK ZEPPELIN"

bash ./deploy-recommended-cluster.bash