Community Articles

abajwa · ‎09-16-2016

Update Feb 2018 - Updated article for HDF 3.1: https://community.hortonworks.com/articles/173816/automate-deployment-of-hdf-31-clusters-using-ambar...

Summary:

Ambari blueprints can be used to automate setting up clusters. With Ambari support being added to HDF 2.0, the same can be done for HDF clusters running Nifi, Storm, Kafka. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. If you prefer, a script is also provided at the bottom of the article that automates these steps, so you can deploy the cluster in a few commands. Tested with HDF 2.x and 3.0

There is also a single node HDF 2.1 demo cluster available on AWS as an AMI which can be brought up in 10 min. Details here

Prerequisite:

A number of freshly installed hosts running CentOS/RHEL 6 or 7 where HDF is to be installed

Reminder:

Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster)

Steps:

1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run this on all hosts where Ambari-server will not be running to run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:

export ambari_server=<FQDN of host where ambari-server will be installed>;  #replace this
export install_ambari_server=false
export ambari_version=2.5.1.0      ##don't use 2.5.2 for HDF, there is a bug

curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;

2. Run remaining steps on host where Ambari-server is to be installed. These run pre-reqs and install Ambari-server

export ambari_password="admin"   # customize password
export cluster_name="HDF"        # customize cluster name
export ambari_services="ZOOKEEPER NIFI KAFKA STORM LOGSEARCH AMBARI_METRICS AMBARI_INFRA" 
export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.0.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.0.0.0-453.tar.gz"  #replace with the mpack url you want to install
export ambari_version=2.5.1.0    ##don't use 2.5.2 for HDF, there is a bug




#install bootstrap
yum install -y git python-argparse
git clone https://github.com/seanorama/ambari-bootstrap.git

#Runs pre-reqs and install ambari-server
export install_ambari_server=true
~/ambari-bootstrap/ambari-bootstrap.sh

3. Install mpack and restart Ambari so it forgets HDP and recognizes only HDF stack:

ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose

ambari-server restart

At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in the doc. Other to proceed with deploying via blueprints follow the remaining steps.

4. (Optional) modify any configurations you like for any of the components by creating configuration-custom.json. Below shows how to customize Nifi dirs, ports, and the user/group the service runs as. Basically you would create sections in the JSON corresponding to the name of the relevant config file and include the property name and desired value. For a complete listing of available Nifi property files and corresponding properties that Ambari recognizes, check the Nifi service code

cd ~/ambari-bootstrap/deploy/
tee configuration-custom.json > /dev/null << EOF
{
  "configurations" : {
    "nifi-ambari-config": {
        "nifi.security.encrypt.configuration.password": "changemeplease",
        "nifi.content.repository.dir.default": "/nifi/content_repository",
        "nifi.database.dir": "/nifi/database_repository",
        "nifi.flowfile.repository.dir": "/nifi/flowfile_repository",
        "nifi.internal.dir": "/nifi",
        "nifi.provenance.repository.dir.default": "/nifi/provenance_repository",        
        "nifi.max_mem": "1g",        
        "nifi.node.port": "9092",                
        "nifi.node.protocol.port": "9089",                        
        "nifi.node.ssl.port": "9093"                                
    },
    "nifi-env": {
        "nifi_user": "mynifiuser",
        "nifi_group": "mynifigroup"
    }
  }
}
EOF

5. If you chose to skip the previous step, run below to generate a basic configuration-custom.json file. Change the password, but make sure its at least 12 characters or deployment will fail.

echo '{ "configurations" : { "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "changemeplease" }}}' > ~/ambari-bootstrap/deploy/configuration-custom.json

Then generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server)

export host_count=<Number of total nodes>
export ambari_stack_name=HDF
export ambari_stack_version=3.0    #replace this with HDF stack version
export ambari_services="NIFI KAFKA STORM AMBARI_METRICS ZOOKEEPER LOGSEARCH AMBARI_INFRA"
./deploy-recommended-cluster.bash

You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed!

Notes:

a) This will only install Nifi on a single node of the cluster by default

b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json:

    "nifi-ambari-ssl-config": {
        "nifi.toolkit.tls.token": "hadoop",
        "nifi.node.ssl.isenabled": "true",
        "nifi.security.needClientAuth": "true",
        "nifi.toolkit.dn.suffix": ", OU=HORTONWORKS",
        "nifi.initial.admin.identity": "CN=nifiadmin, OU=HORTONWORKS",
        "content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>"
    },

Make sure to replace node-x.fqdn with the FQDN of each node running Nifi

c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the Twitter demo into an env var

twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw)

Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint:

     "nifi-flow-env" : {
        "properties_attributes" : { },
        "properties" : {
            "content" : "${twitter_flow}"
        }
     }

d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well:

export deploy=false

.... The blueprint will be created under ~/ambari-bootstrap/deploy/tempdir*/blueprint.json

Sample script

A sample script based on this logic is available here. In addition to the steps above it can also optionally:

enable installation of Nifi to all nodes of the cluster
sets up Ambari's Postgres DB for Ranger (in case Ranger will be installed post-cluster-install)
sets up KDC (in case kerberos will be enabled later)

For example, to deploy a single node HDF sandbox, you can just run below on freshly installed CentOS 6 VM (don't run this on sandbox or VM where Ambari already installed). You can customize the behaviour by exporting environment variables as shown.

#run below as root
export host_count=1;
curl -sSL https://gist.github.com/abajwa-hw/ae4125c5154deac6713cdd25d2b83620/raw | sudo -E sh ;

What next?

Now that your cluster is up, you can explore what Nifi's Ambari integration means: https://community.hortonworks.com/articles/57980/hdf-20-apache-nifi-integration-with-apache-ambarir....
Next, you can enable SSL for Nifi: https://community.hortonworks.com/articles/58009/hdf-20-enable-ssl-for-apache-nifi-from-ambari.html

Sample blueprint

Sample generated blueprint for 3 node cluster is provided for reference here:

{
  "Blueprints": {
    "stack_name": "HDF",
    "stack_version": "2.0"
  },
  "host_groups": [
    {
      "name": "host-group-1",
      "components": [
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "SUPERVISOR"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "NIFI_CA"
        },
        {
          "name": "NIMBUS"
        },
        {
          "name": "DRPC_SERVER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "STORM_UI_SERVER"
        }
      ]
    },
    {
      "name": "host-group-2",
      "components": [
        {
          "name": "NIFI_MASTER"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "SUPERVISOR"
        },
        {
          "name": "INFRA_SOLR"
        },
        {
          "name": "INFRA_SOLR_CLIENT"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "LOGSEARCH_SERVER"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "METRICS_GRAFANA"
        },
        {
          "name": "KAFKA_BROKER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        }
      ]
    },
    {
      "name": "host-group-3",
      "components": [
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "SUPERVISOR"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "METRICS_COLLECTOR"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        }
      ]
    }
  ],
  "configurations": [
    {
      "nifi-ambari-config": {
        "nifi.node.protocol.port": "9089",
        "nifi.internal.dir": "/nifi",
        "nifi.node.port": "9092",
        "nifi.provenance.repository.dir.default": "/nifi/provenance_repository",
        "nifi.content.repository.dir.default": "/nifi/content_repository",
        "nifi.flowfile.repository.dir": "/nifi/flowfile_repository",
        "nifi.max_mem": "1g",
        "nifi.database.dir": "/nifi/database_repository",
        "nifi.node.ssl.port": "9093"
      }
    },
    {
      "ams-env": {
        "metrics_collector_heapsize": "512"
      }
    },
    {
      "ams-hbase-env": {
        "hbase_master_heapsize": "512",
        "hbase_regionserver_heapsize": "768",
        "hbase_master_xmn_size": "192"
      }
    },
    {
      "storm-site": {
        "metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter"
      }
    },
    {
      "nifi-env": {
        "nifi_group": "mynifigroup",
        "nifi_user": "mynifiuser"
      }
    },
    {
      "ams-hbase-site": {
        "hbase.regionserver.global.memstore.upperLimit": "0.35",
        "hbase.regionserver.global.memstore.lowerLimit": "0.3",
        "hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
        "hbase.hregion.memstore.flush.size": "134217728",
        "hfile.block.cache.size": "0.3",
        "hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
        "hbase.cluster.distributed": "false",
        "phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
        "hbase.zookeeper.property.clientPort": "61181"
      }
    },
    {
      "logsearch-properties": {}
    },
    {
      "kafka-log4j": {}
    },
    {
      "ams-site": {
        "timeline.metrics.service.webapp.address": "localhost:6188",
        "timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.SocketServer.IdlePercent.networkProcessor.0.5MinuteRate",
        "timeline.metrics.host.aggregate.splitpoints": "kafka.network.SocketServer.IdlePercent.networkProcessor.0.5MinuteRate",
        "timeline.metrics.host.aggregator.ttl": "86400",
        "timeline.metrics.service.handler.thread.count": "20",
        "timeline.metrics.service.watcher.disabled": "false"
      }
    },
    {
      "kafka-broker": {
        "kafka.metrics.reporters": "org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter"
      }
    },
    {
      "ams-grafana-env": {}
    }
  ]
}

rsheu · ‎11-16-2016

Ali, the scripts work. But, I want to deploy NIFI to all the HDF nodes (e.g., 4). Currently using the above scripts, I only see one NIFI in one of the node. I know NiFi 1.0.x is "master-less". But, I don't see the rest of nodes having NIFI component installed .

rsheu · ‎11-21-2016

Actually, I found out that the script by Ali, abajwa-hw, already has shown how to deploy NiFi to each node in the multi-node cluster. Specifically, it is the environment variable, export install_nifi_on_all_nodes="${install_nifi_on_all_nodes:-true}"

bhakthan · ‎10-20-2017

At Step 2, (with new ambari-bootstrap.sh)

We need to add an additional line to the blueprint steps,

export install_ambari_agent=false

Cloudera Community

Community Articles

Automate Deployment of HDF 2.x/3.0 clusters using Ambari blueprints

Apache Ambari

Apache NiFi

Cloudera DataFlow (CDF)

Re: Automate Deployment of HDF 2.x/3.0 clusters using Ambari blueprints

Re: Automate Deployment of HDF 2.x/3.0 clusters using Ambari blueprints

Re: Automate Deployment of HDF 2.x/3.0 clusters using Ambari blueprints