Created on 09-08-2018 05:52 PM - edited 08-17-2019 06:27 AM
The release of HDF 3.3 brings about a significant number of improvements in HDF. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy either HDF only clusters or combined HDP/HDF clusters in 5 easy steps. To quickly setup a single node setup, prebuilt AMIs are available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands.
Steps for each of the below option are described in this article:
1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab.
2. Select the AMI from ‘N. California’ region by clicking one of the below options
Now choose instance type: select ‘m4.2xlarge’ and click Next
Note: if you choose a smaller instance type from the above recommendation, not all services may come up
3. Configure Instance Details: leave the defaults and click ‘Next’
4. Add storage: keep at least the default of 800 GB and click ‘Next’
5. Optionally, add a name or any other tags you like. Then click ‘Next’
6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’
7. Review your settings and click Launch
8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’
9. Click the shown link under ‘Your instances are now launching’
10. This opens the EC2 dashboard that shows the details of your launched instance
11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster). If it is blank, wait 1-2 minutes for this to be populated.
12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as user:admin and pass:StrongPassword (see previous step)
13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done.
(Optional) You can also monitor the startup using the log as below:
Open SSH session into the VM using your key and the public IP e.g. from OSX:
ssh -i ~/.ssh/mykey.pem centos@<publicIP>
Tail the startup log:
tail -f /var/log/hdp_startup.log
Once you see “cluster is ready!” you can proceed
14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up.
Launch a fresh CentOS/RHEL 7 instance with 4+cpu and 16GB+ RAM and run below. Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster)
To deploy HDF 3.3 only cluster, run below
export host_count=1 curl -sSL https://gist.github.com/abajwa-hw/b5565d7e7f9beffd8dd57a970dc54266/raw | sudo -E sh
To deploy HDF 3.3/HDP3.1 combined cluster, run below
export host_count=1 curl -sSL https://gist.github.com/abajwa-hw/d7cd1c0232c1af46ee2c465e4871ddc6/raw | sudo -E sh
Once launched, the script will install Ambari and use it to deploy HDF cluster
Note: this script can also be used to install multi-node clusters after step #1 below is complete (i.e. after the agents on non-AmabriServer nodes are installed and registered). Just change the value of the host_count variable
0. Launch your RHEL/CentOS 7 instances where you wish to install HDF. In this example, we will use 4 m4.xlarge instances. Select an instance where ambari-server should run (e.g. node1)
1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run below on all hosts where Ambari-server will not be running (e.g. node2-4). This will run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of host where ambari-server will be installed>;#replace this export install_ambari_server=false export ambari_version=2.7.3.0 curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2. Run remaining steps on host where Ambari-server is to be installed (e.g. node1). The below commands run pre-reqs and install Ambari-server
export db_password="StrongPassword" # MySQL password export nifi_password="StrongPassword" # NiFi password must be at least ten chars export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/amazonlinux2/3.x/updates/3.3.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.3.0.0-165.tar.gz" export ambari_version=2.7.3.0 #install bootstrap yum install -y git python-argparse cd /tmp git clone https://github.com/seanorama/ambari-bootstrap.git #Runs pre-reqs and install ambari-server export install_ambari_server=true curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
3. On the same node, install MySQL and create databases and users for Schema Registry and SAM
sudo yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm sudo yum install -y epel-release mysql-connector-java* mysql-community-server # MySQL Setup sudo systemctl enable mysqld.service sudo systemctl start mysqld.service #extract system generated Mysql password oldpass=$( grep 'temporary.*root@localhost' /var/log/mysqld.log | tail -n 1| sed 's/.*root@localhost: //') #create sql file that # 1. reset Mysql password to temp value and create druid/superset/registry/streamline schemas and users # 2. sets passwords for druid/superset/registry/streamline users to ${db_password} cat << EOF > mysql-setup.sql ALTER USER 'root'@'localhost' IDENTIFIED BY 'Secur1ty!';uninstall plugin validate_password;CREATE DATABASE registry DEFAULT CHARACTER SET utf8; CREATE DATABASE streamline DEFAULT CHARACTER SET utf8;CREATE USER 'registry'@'%' IDENTIFIED BY '${db_password}'; CREATE USER 'streamline'@'%' IDENTIFIED BY '${db_password}';GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ; GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;commit; EOF #execute sqlfile mysql -h localhost -u root -p"$oldpass" --connect-expired-password < mysql-setup.sql #change Mysql password to StrongPassword mysqladmin -u root -p'Secur1ty!' password StrongPassword #test password and confirm dbs created mysql -u root -pStrongPassword -e 'show databases;'
4. On the same node, install Mysql connector jar and then HDF mpack. Then restart Ambari so it recognizes HDF stack:
sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar sudo ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose sudo ambari-server restart
At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in the doc. Otherwise, to proceed with deploying via blueprints follow the remaining steps.
4. On the same node, provide minimum configurations required for install by creating configuration-custom.json. You can add to this to customize any component's property that is exposed by Ambari
cd /tmp/ambari-bootstrap/deploy cat << EOF > configuration-custom.json { "configurations": { "ams-grafana-env": { "metrics_grafana_password": "${ambari_password}" }, "kafka-broker": { "offsets.topic.replication.factor": "1" }, "streamline-common": { "jar.storage.type": "local", "streamline.storage.type": "mysql", "streamline.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/streamline", "registry.url" : "http://localhost:7788/api/v1", "streamline.dashboard.url" : "http://localhost:9089", "streamline.storage.connector.password": "${db_password}" }, "registry-common": { "jar.storage.type": "local", "registry.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/registry", "registry.storage.type": "mysql", "registry.storage.connector.password": "${db_password}" }, "nifi-registry-ambari-config": { "nifi.registry.security.encrypt.configuration.password": "${nifi_password}" }, "nifi-registry-properties": { "nifi.registry.db.password": "${nifi_password}" }, "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "${nifi_password}" } } } EOF
5. Then run below as root to generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server)
sudo su cd /tmp/ambari-bootstrap/deploy/ export host_count=<Number of total nodes> export ambari_stack_name=HDF export ambari_stack_version=3.3 export cluster_name="HDF" export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS KNOX" ./deploy-recommended-cluster.bash
You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed!
Notes:
a) This will only install Nifi on a single node of the cluster by default
b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json:
"nifi-ambari-ssl-config":{ "nifi.toolkit.tls.token":"hadoop", "nifi.node.ssl.isenabled":"true", "nifi.security.needClientAuth":"true", "nifi.toolkit.dn.suffix":", OU=HORTONWORKS", "nifi.initial.admin.identity":"CN=nifiadmin, OU=HORTONWORKS", "content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>" },
Make sure to replace node-x.fqdn with the FQDN of each node running Nifi
c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the Twitter demo into an env var
twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw)
Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint:
"nifi-flow-env":{ "properties_attributes":{}, "properties":{"content":"${twitter_flow}"} }
d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well:
export deploy=false
.... The blueprint will be created under /tmp/ambari-bootstrap*/deploy/tempdir*/blueprint.json
Sample blueprints
Sample generated blueprint for 4 node HDF 3.3 only cluster is provided for reference here:
{ "Blueprints": { "stack_name": "HDF", "stack_version": "3.3" }, "host_groups": [ { "name": "host-group-1", "components": [ { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "NIFI_CA" }, { "name": "STREAMLINE_SERVER" } ] }, { "name": "host-group-4", "components": [ { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "METRICS_COLLECTOR" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "STREAMLINE_SERVER" } ] }, { "name": "host-group-2", "components": [ { "name": "NIFI_MASTER" }, { "name": "DRPC_SERVER" }, { "name": "METRICS_GRAFANA" }, { "name": "KAFKA_BROKER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "STREAMLINE_SERVER" }, { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "NIMBUS" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "KNOX_GATEWAY" }, { "name": "NIFI_REGISTRY_MASTER" }, { "name": "REGISTRY_SERVER" }, { "name": "STORM_UI_SERVER" } ] }, { "name": "host-group-3", "components": [ { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "STREAMLINE_SERVER" } ] } ], "configurations": [ { "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "StrongPassword" } }, { "nifi-registry-ambari-config": { "nifi.registry.security.encrypt.configuration.password": "StrongPassword" } }, { "ams-hbase-env": { "hbase_master_heapsize": "512", "hbase_regionserver_heapsize": "768", "hbase_master_xmn_size": "192" } }, { "nifi-logsearch-conf": {} }, { "storm-site": { "metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter", "topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\.population$\", \"__sendqueue\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\.usedBytes$\", \"memory/nonHeap\\.usedBytes$\", \"GC/.+\\.count$\", \"GC/.+\\.timeMs$\"]}]", "storm.local.dir": "/hadoop/storm", "storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]" } }, { "registry-common": { "registry.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/registry", "registry.storage.type": "mysql", "jar.storage.type": "local", "registry.storage.connector.password": "StrongPassword" } }, { "registry-env": {} }, { "registry-logsearch-conf": {} }, { "streamline-common": { "streamline.storage.type": "mysql", "streamline.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/streamline", "streamline.dashboard.url": "http://localhost:9089", "registry.url": "http://localhost:7788/api/v1", "jar.storage.type": "local", "streamline.storage.connector.password": "StrongPassword" } }, { "nifi-registry-properties": { "nifi.registry.db.password": "StrongPassword" } }, { "ams-hbase-site": { "hbase.regionserver.global.memstore.upperLimit": "0.35", "hbase.regionserver.global.memstore.lowerLimit": "0.3", "hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp", "hbase.hregion.memstore.flush.size": "134217728", "hfile.block.cache.size": "0.3", "hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase", "hbase.cluster.distributed": "false", "phoenix.coprocessor.maxMetaDataCacheSize": "20480000", "hbase.zookeeper.property.clientPort": "61181" } }, { "storm-env": {} }, { "streamline-env": {} }, { "ams-site": { "timeline.metrics.service.webapp.address": "localhost:6188", "timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile", "timeline.metrics.downsampler.event.metric.patterns": "topology\.%", "timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile", "timeline.metrics.service.handler.thread.count": "20", "timeline.metrics.service.watcher.disabled": "false", "timeline.metrics.host.aggregator.ttl": "86400" } }, { "kafka-broker": { "log.dirs": "/kafka-logs", "offsets.topic.replication.factor": "1" } }, { "ams-grafana-env": { "metrics_grafana_password": "StrongPassword" } }, { "streamline-logsearch-conf": {} }, { "zoo.cfg": { "dataDir": "/hadoop/zookeeper" } }, { "ams-env": { "metrics_collector_heapsize": "512" } } ] }<br>
Sample cluster.json for this 4 node cluster:
{ "blueprint": "recommended", "default_password": "hadoop", "host_groups": [ { "hosts": [ { "fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal" } ], "name": "host-group-1" }, { "hosts": [ { "fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal" } ], "name": "host-group-3" }, { "hosts": [ { "fqdn": "ip-xxx-xxx-xxx-xxx.us-west-1.compute.internal" } ], "name": "host-group-4" }, { "hosts": [ { "fqdn": "ip-xx-xx-xx-xxx.us-west-1.compute.internal" } ], "name": "host-group-2" } ] }