Created on 09-16-2016 07:21 AM
Update Feb 2018 - Updated article for HDF 3.1: https://community.hortonworks.com/articles/173816/automate-deployment-of-hdf-31-clusters-using-ambar...
Summary:
Ambari blueprints can be used to automate setting up clusters. With Ambari support being added to HDF 2.0, the same can be done for HDF clusters running Nifi, Storm, Kafka. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. If you prefer, a script is also provided at the bottom of the article that automates these steps, so you can deploy the cluster in a few commands. Tested with HDF 2.x and 3.0
There is also a single node HDF 2.1 demo cluster available on AWS as an AMI which can be brought up in 10 min. Details here
Prerequisite:
A number of freshly installed hosts running CentOS/RHEL 6 or 7 where HDF is to be installed
Reminder:
Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster)
Steps:
1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run this on all hosts where Ambari-server will not be running to run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of host where ambari-server will be installed>; #replace this export install_ambari_server=false export ambari_version=2.5.1.0 ##don't use 2.5.2 for HDF, there is a bug curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2. Run remaining steps on host where Ambari-server is to be installed. These run pre-reqs and install Ambari-server
export ambari_password="admin" # customize password export cluster_name="HDF" # customize cluster name export ambari_services="ZOOKEEPER NIFI KAFKA STORM LOGSEARCH AMBARI_METRICS AMBARI_INFRA" export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.0.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.0.0.0-453.tar.gz" #replace with the mpack url you want to install export ambari_version=2.5.1.0 ##don't use 2.5.2 for HDF, there is a bug #install bootstrap yum install -y git python-argparse git clone https://github.com/seanorama/ambari-bootstrap.git #Runs pre-reqs and install ambari-server export install_ambari_server=true ~/ambari-bootstrap/ambari-bootstrap.sh
3. Install mpack and restart Ambari so it forgets HDP and recognizes only HDF stack:
ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose ambari-server restart
At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in the doc. Other to proceed with deploying via blueprints follow the remaining steps.
4. (Optional) modify any configurations you like for any of the components by creating configuration-custom.json. Below shows how to customize Nifi dirs, ports, and the user/group the service runs as. Basically you would create sections in the JSON corresponding to the name of the relevant config file and include the property name and desired value. For a complete listing of available Nifi property files and corresponding properties that Ambari recognizes, check the Nifi service code
cd ~/ambari-bootstrap/deploy/ tee configuration-custom.json > /dev/null << EOF { "configurations" : { "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "changemeplease", "nifi.content.repository.dir.default": "/nifi/content_repository", "nifi.database.dir": "/nifi/database_repository", "nifi.flowfile.repository.dir": "/nifi/flowfile_repository", "nifi.internal.dir": "/nifi", "nifi.provenance.repository.dir.default": "/nifi/provenance_repository", "nifi.max_mem": "1g", "nifi.node.port": "9092", "nifi.node.protocol.port": "9089", "nifi.node.ssl.port": "9093" }, "nifi-env": { "nifi_user": "mynifiuser", "nifi_group": "mynifigroup" } } } EOF
5. If you chose to skip the previous step, run below to generate a basic configuration-custom.json file. Change the password, but make sure its at least 12 characters or deployment will fail.
echo '{ "configurations" : { "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "changemeplease" }}}' > ~/ambari-bootstrap/deploy/configuration-custom.json
Then generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server)
export host_count=<Number of total nodes> export ambari_stack_name=HDF export ambari_stack_version=3.0 #replace this with HDF stack version export ambari_services="NIFI KAFKA STORM AMBARI_METRICS ZOOKEEPER LOGSEARCH AMBARI_INFRA" ./deploy-recommended-cluster.bash
You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed!
Notes:
a) This will only install Nifi on a single node of the cluster by default
b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json:
"nifi-ambari-ssl-config": { "nifi.toolkit.tls.token": "hadoop", "nifi.node.ssl.isenabled": "true", "nifi.security.needClientAuth": "true", "nifi.toolkit.dn.suffix": ", OU=HORTONWORKS", "nifi.initial.admin.identity": "CN=nifiadmin, OU=HORTONWORKS", "content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>" },
Make sure to replace node-x.fqdn with the FQDN of each node running Nifi
c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the Twitter demo into an env var
twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw)
Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint:
"nifi-flow-env" : { "properties_attributes" : { }, "properties" : { "content" : "${twitter_flow}" } }
d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well:
export deploy=false
.... The blueprint will be created under ~/ambari-bootstrap/deploy/tempdir*/blueprint.json
Sample script
A sample script based on this logic is available here. In addition to the steps above it can also optionally:
For example, to deploy a single node HDF sandbox, you can just run below on freshly installed CentOS 6 VM (don't run this on sandbox or VM where Ambari already installed). You can customize the behaviour by exporting environment variables as shown.
#run below as root export host_count=1; curl -sSL https://gist.github.com/abajwa-hw/ae4125c5154deac6713cdd25d2b83620/raw | sudo -E sh ;
What next?
Sample blueprint
Sample generated blueprint for 3 node cluster is provided for reference here:
{ "Blueprints": { "stack_name": "HDF", "stack_version": "2.0" }, "host_groups": [ { "name": "host-group-1", "components": [ { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "LOGSEARCH_LOGFEEDER" }, { "name": "NIFI_CA" }, { "name": "NIMBUS" }, { "name": "DRPC_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "STORM_UI_SERVER" } ] }, { "name": "host-group-2", "components": [ { "name": "NIFI_MASTER" }, { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "INFRA_SOLR" }, { "name": "INFRA_SOLR_CLIENT" }, { "name": "LOGSEARCH_LOGFEEDER" }, { "name": "LOGSEARCH_SERVER" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "METRICS_GRAFANA" }, { "name": "KAFKA_BROKER" }, { "name": "ZOOKEEPER_SERVER" } ] }, { "name": "host-group-3", "components": [ { "name": "METRICS_MONITOR" }, { "name": "SUPERVISOR" }, { "name": "LOGSEARCH_LOGFEEDER" }, { "name": "METRICS_COLLECTOR" }, { "name": "ZOOKEEPER_SERVER" } ] } ], "configurations": [ { "nifi-ambari-config": { "nifi.node.protocol.port": "9089", "nifi.internal.dir": "/nifi", "nifi.node.port": "9092", "nifi.provenance.repository.dir.default": "/nifi/provenance_repository", "nifi.content.repository.dir.default": "/nifi/content_repository", "nifi.flowfile.repository.dir": "/nifi/flowfile_repository", "nifi.max_mem": "1g", "nifi.database.dir": "/nifi/database_repository", "nifi.node.ssl.port": "9093" } }, { "ams-env": { "metrics_collector_heapsize": "512" } }, { "ams-hbase-env": { "hbase_master_heapsize": "512", "hbase_regionserver_heapsize": "768", "hbase_master_xmn_size": "192" } }, { "storm-site": { "metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter" } }, { "nifi-env": { "nifi_group": "mynifigroup", "nifi_user": "mynifiuser" } }, { "ams-hbase-site": { "hbase.regionserver.global.memstore.upperLimit": "0.35", "hbase.regionserver.global.memstore.lowerLimit": "0.3", "hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp", "hbase.hregion.memstore.flush.size": "134217728", "hfile.block.cache.size": "0.3", "hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase", "hbase.cluster.distributed": "false", "phoenix.coprocessor.maxMetaDataCacheSize": "20480000", "hbase.zookeeper.property.clientPort": "61181" } }, { "logsearch-properties": {} }, { "kafka-log4j": {} }, { "ams-site": { "timeline.metrics.service.webapp.address": "localhost:6188", "timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.SocketServer.IdlePercent.networkProcessor.0.5MinuteRate", "timeline.metrics.host.aggregate.splitpoints": "kafka.network.SocketServer.IdlePercent.networkProcessor.0.5MinuteRate", "timeline.metrics.host.aggregator.ttl": "86400", "timeline.metrics.service.handler.thread.count": "20", "timeline.metrics.service.watcher.disabled": "false" } }, { "kafka-broker": { "kafka.metrics.reporters": "org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter" } }, { "ams-grafana-env": {} } ] }
Created on 11-16-2016 03:32 PM
Ali, the scripts work. But, I want to deploy NIFI to all the HDF nodes (e.g., 4). Currently using the above scripts, I only see one NIFI in one of the node. I know NiFi 1.0.x is "master-less". But, I don't see the rest of nodes having NIFI component installed .
Created on 11-21-2016 03:06 PM
Actually, I found out that the script by Ali, abajwa-hw, already has shown how to deploy NiFi to each node in the multi-node cluster. Specifically, it is the environment variable, export install_nifi_on_all_nodes="${install_nifi_on_all_nodes:-true}"
Created on 10-20-2017 05:31 PM
At Step 2, (with new ambari-bootstrap.sh)
We need to add an additional line to the blueprint steps,
export install_ambari_agent=false