Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
05-05-2018
12:03 AM
3 Kudos
Summary: While automating setup of Hortoniabank demo, we needed to automate the task of associating Atlas tags to HDP entities like HDFS, Hive, HBase, Kafka using the names of entities (rather than their guids in Atlas). One option is to use Atlas APIs to find the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. For components like Hive that already have Atlas hook, the Atlas entities for Hive tables will automatically be created when the table is created. For these, we have just provided the API calls to associate the tags with the entity. For others like Kafka, HDFS, Hbase etc that do not have an Atlas hook (as of HDP 2.6.x), you will need to create the entity first. For these, we have provided both the API call to create the entity and the call to associate the tags with the entity. Code samples: The below code examples assume the tags have already been created. these can be created either manually via Atlas UI or using the API. Here is a sample Atlas API call to create a basic tag called TEST that does not have any attributes. ${atlas_curl} ${atlas_url}/types \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"TEST","typeDescription":"TEST","typeVersion":"1.0","attributeDefinitions":[]}],"classTypes":[]}'
All the examples operate the same way: find the guid of the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. First we setup common vars: atlas_host="atlas.domain.com"
cluster_name="datalake"
atlas_curl="curl -u admin:admin"
atlas_url="http://${atlas_host}:21000/api/atlas"
Example 1: Associate tag REFERENCE_DATA (w/o attributes) to Hive table hortoniabank.eu_countries #fetch guid for table hortoniabank.eu_countries@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=hortoniabank.eu_countries@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add REFERENCE_DATA tag
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}' Example 2: Associate tag DATA_QUALITY (with attribute: score and value: 0.51) to Hive table cost_savings.claim_savings #fetch guid for table cost_savings.claim_savings@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=cost_savings.claim_savings@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add DATA_QUALITY tag with score=0.51
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"DATA_QUALITY", "values":{"score": "0.51"}}'
Example 3: Associate tag FINANCE_PII (with attribute: type and value:finance) to Hive column finance.tax_2015.ssn #fetch guid for finance.tax_2015.ssn
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_column?attr:qualifiedName=finance.tax_2015.ssn@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add FINANCE_PII tag with type=finance
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"FINANCE_PII", "values":{"type": "finance"}}' Example 4: Create entity for kafka topic PRIVATE and associate with tag SENSITIVE #create entities for kafka topics PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"kafka_topic", "attributes":{ "description":null, "name":"PRIVATE", "owner":null, "qualifiedName":"PRIVATE@${cluster_name}", "topic":"PRIVATE", "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/kafka_topic?attr:qualifiedName=PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 5: create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag #create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hbase_table", "attributes":{ "description":"T_PRIVATE table", "name":"T_PRIVATE", "owner":"hbase", "qualifiedName":"T_PRIVATE@${cluster_name}", "column_families":[ ], "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hbase_table?attr:qualifiedName=T_PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 6: create entities for HDFS path /banking and associate with BANKING tag #create entities for HDFS path /banking and associate with BANKING tag
hdfs_prefix="hdfs://$(hostname -f):8020"
hdfs_path="/banking"
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hdfs_path", "attributes":{ "description":null, "name":"${hdfs_path}", "owner":null, "qualifiedName":"${hdfs_prefix}${hdfs_path}", "clusterName":"${cluster_name}", "path":"${hdfs_prefix}${hdfs_path}" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hdfs_path?attr:qualifiedName=${hdfs_prefix}${hdfs_path} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"BANKING","values":{}}'
... View more
Labels:
02-17-2018
10:28 AM
8 Kudos
Summary:
The
release of HDF 3.1 brings about a significant number of improvements in HDF: Apache Nifi 1.5, Kafka 1.0, plus the new NiFi registry. In addition, there were improvements to Storm, Streaming Analytics Manager, Schema Registry components.
This article shows how you can use
ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. To quickly setup a single node setup, a prebuilt AMI is available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands.
Steps for each of the below option are described in this article: A. Single-node prebuilt AMI on AWS B. Single-node fresh install C. Multi-node fresh install
A. Single-node prebuilt AMI on AWS: Steps to launch the AMI
1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab.
2. Select the AMI from ‘N. California’ region by clicking here. Now choose instance type: select ‘m4.2xlarge’ and click Next
Note: if you choose a smaller instance type from the above recommendation, not all services may come up
3. Configure Instance Details: leave the defaults and click ‘Next’
4. Add storage: keep at least the default of 100 GB and click ‘Next’
5. Optionally, add a name or any other tags you like. Then click ‘Next’
6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’
7. Review your settings and click Launch
8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’
9. Click the shown link under ‘Your instances are now launching’
10. This opens the EC2 dashboard that shows the details of your launched instance
11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster). If it is blank, wait 1-2 minutes for this to be populated. Also make note of your AWS Owner Id (which will be the initial password to login)
12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as user:admin and pass:your AWS Owner Id (see previous step)
13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done.
(Optional) You can also monitor the startup using the log as below:
Open SSH session into the VM using your key and the public IP e.g. from OSX:
ssh -i ~/.ssh/mykey.pem centos@<publicIP>
Tail the startup log:
tail -f /var/log/hdp_startup.log
Once you see “cluster is ready!” you can proceed
14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up.
Other related AMIs
HDP 2.6.4 vanilla AMI (ami-764d4516): Hortonworks HDP 2.6.4 single node cluster running Hive/Spark/Druid/Superset installed via Ambari. Built Feb 18 2018 using HDP 2.6.4.0-91 / Ambari 2.6.1.3-3. Ambari password is your AWS ownerid HDP 2.6.4 including NiFi and NiFi registry from HDF 3.1 (ami-e1a0a981): HDP 2.6.4 plus NiFi 1.5 and Nifi Registry - Ambari admin password is StrongPassword. Built Feb 17 2018 HDP 2.6 plus HDF 3.0 and IOT trucking demo reference app. Details here Note: Above AMIs are available on US West (N. California) region of AWS
B. Single-node HDF install:
Launch a fresh CentOS/RHEL 7 instance with 4+cpu and 16GB+ RAM and run below.
Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster)
export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/b7c027d9eea9fbd2a2319a21a955df1f/raw | sudo -E sh
Once launched, the script will install Ambari and use it to deploy HDF cluster
Note: this script can also be used to install multi-node clusters after step #1 below is complete i.e. after the agents on non-AmabriServer nodes are installed and registered Other related scripts 1. Automation to setup HDP 2.6.x plus NiFi from HDF 3.1 export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/bbe2bdd1ed6a0f738a90dd4e07480e3b/raw | sudo -E sh
C. Multi-node HDF install: 0. Launch your RHEL/CentOS 7 instances where you wish to install HDF. In this example, we will use 4 m4.xlarge instances. Select an instance where ambari-server should run (e.g. node1)
1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run below on all hosts where Ambari-server
will not be running (e.g. node2-4). This will run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of host where ambari-server will be installed>; #replace this
export install_ambari_server=false
export ambari_version=2.6.1.0
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2.
Run remaining steps on host where Ambari-server is to be installed (e.g. node1). The below commands run pre-reqs and install Ambari-server
export db_password="StrongPassword" # MySQL password
export nifi_password="StrongPassword" # NiFi password - must be at least 10 chars
export cluster_name="HDF" # cluster name
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS" #choose services
export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.1.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.1.0.0-564.tar.gz"
export ambari_version=2.6.1.0
#install bootstrap
yum install -y git python-argparse
cd /tmp
git clone https://github.com/seanorama/ambari-bootstrap.git
#Runs pre-reqs and install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
3. On the same node, install MySQL and create databases and users for Schema Registry and SAM
sudo yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
sudo yum install -y epel-release mysql-connector-java* mysql-community-server
# MySQL Setup to keep the new services separate from the originals
echo Database setup...
sudo systemctl enable mysqld.service
sudo systemctl start mysqld.service
#extract system generated Mysql password
oldpass=$( grep 'temporary.*root@localhost' /var/log/mysqld.log | tail -n 1 | sed 's/.*root@localhost: //' )
#create sql file that
# 1. reset Mysql password to temp value and create druid/superset/registry/streamline schemas and users
# 2. sets passwords for druid/superset/registry/streamline users to ${db_password}
cat < mysql-setup.sql
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Secur1ty!';
uninstall plugin validate_password;
CREATE DATABASE registry DEFAULT CHARACTER SET utf8; CREATE DATABASE streamline DEFAULT CHARACTER SET utf8;
CREATE USER 'registry'@'%' IDENTIFIED BY '${db_password}'; CREATE USER 'streamline'@'%' IDENTIFIED BY '${db_password}';
GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ; GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;
commit;
EOF
#execute sql file
mysql -h localhost -u root -p"$oldpass" --connect-expired-password < mysql-setup.sql
#change Mysql password to StrongPassword
mysqladmin -u root -p'Secur1ty!' password StrongPassword
#test password and confirm dbs created
mysql -u root -pStrongPassword -e 'show databases;'
4. On the same node, install Mysql connector jar and then HDF mpack. Then restart Ambari so it recognizes HDF stack:
sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
sudo ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose
sudo ambari-server restart
At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in
the doc. Otherwise, to proceed with deploying via blueprints follow the remaining steps.
4. On the same node, provide minimum configurations required for install by creating configuration-custom.json. You can add to this to customize any component's property that is exposed by Ambari
cd /tmp/ambari-bootstrap/deploy/
tee configuration-custom.json > /dev/null << EOF
{
"configurations": {
"ams-grafana-env": {
"metrics_grafana_password": "${db_password}"
},
"streamline-common": {
"jar.storage.type": "local",
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/streamline",
"registry.url" : "http://localhost:7788/api/v1",
"streamline.dashboard.url" : "http://localhost:9089",
"streamline.storage.connector.password": "${db_password}"
},
"registry-common": {
"jar.storage.type": "local",
"registry.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/registry",
"registry.storage.type": "mysql",
"registry.storage.connector.password": "${db_password}"
},
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "${nifi_password}"
},
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "${nifi_password}"
}
}
}
EOF
5. Then run below as root to generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server)
sudo su
cd /tmp/ambari-bootstrap/deploy/
export host_count=<Number of total nodes>
export ambari_stack_name=HDF
export ambari_stack_version=3.1
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS"
./deploy-recommended-cluster.bash
You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed!
Notes:
a) This will only install Nifi on a single node of the cluster by default
b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json:
"nifi-ambari-ssl-config": {
"nifi.toolkit.tls.token": "hadoop",
"nifi.node.ssl.isenabled": "true",
"nifi.security.needClientAuth": "true",
"nifi.toolkit.dn.suffix": ", OU=HORTONWORKS",
"nifi.initial.admin.identity": "CN=nifiadmin, OU=HORTONWORKS",
"content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>"
},
Make sure to replace node-x.fqdn with the FQDN of each node running Nifi
c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the
Twitter demo into an env var
twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw)
Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint:
"nifi-flow-env" : {
"properties_attributes" : { },
"properties" : {
"content" : "${twitter_flow}"
}
}
d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well:
export deploy=false
.... The blueprint will be created under /tmp/ambari-bootstrap*/deploy/tempdir*/blueprint.json
Sample blueprint
Sample generated blueprint for 4 node cluster is provided for reference here:
{
"Blueprints": {
"stack_name": "HDF",
"stack_version": "3.1"
},
"host_groups": [
{
"name": "host-group-3",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "DRPC_SERVER"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "KAFKA_BROKER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIMBUS"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "REGISTRY_SERVER"
},
{
"name": "STORM_UI_SERVER"
}
]
},
{
"name": "host-group-2",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
},
{
"name": "host-group-1",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIFI_CA"
}
]
},
{
"name": "host-group-4",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
}
],
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"ams-hbase-env": {
"hbase_master_heapsize": "512",
"hbase_regionserver_heapsize": "768",
"hbase_master_xmn_size": "192"
}
},
{
"nifi-logsearch-conf": {}
},
{
"storm-site": {
"topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\.population$\", \"__sendqueue\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\.usedBytes$\", \"memory/nonHeap\\.usedBytes$\", \"GC/.+\\.count$\", \"GC/.+\\.timeMs$\"]}]",
"metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter",
"storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]"
}
},
{
"registry-common": {
"registry.storage.connector.connectURI": "jdbc:mysql://ip-172-31-21-233.us-west-1.compute.internal:3306/registry",
"registry.storage.type": "mysql",
"jar.storage.type": "local",
"registry.storage.connector.password": "StrongPassword"
}
},
{
"registry-logsearch-conf": {}
},
{
"streamline-common": {
"streamline.storage.type": "mysql",
"jar.storage.type": "local",
"streamline.storage.connector.connectURI": "jdbc:mysql://ip-172-31-21-233.us-west-1.compute.internal:3306/streamline",
"streamline.dashboard.url": "http://localhost:9089",
"registry.url": "http://localhost:7788/api/v1",
"streamline.storage.connector.password": "StrongPassword"
}
},
{
"ams-hbase-site": {
"hbase.regionserver.global.memstore.upperLimit": "0.35",
"hbase.regionserver.global.memstore.lowerLimit": "0.3",
"hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
"hbase.hregion.memstore.flush.size": "134217728",
"hfile.block.cache.size": "0.3",
"hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
"hbase.cluster.distributed": "false",
"phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
"hbase.zookeeper.property.clientPort": "61181"
}
},
{
"ams-env": {
"metrics_collector_heapsize": "512"
}
},
{
"kafka-log4j": {}
},
{
"ams-site": {
"timeline.metrics.service.webapp.address": "localhost:6188",
"timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregator.ttl": "86400",
"timeline.metrics.service.handler.thread.count": "20",
"timeline.metrics.service.watcher.disabled": "false"
}
},
{
"kafka-broker": {
"kafka.metrics.reporters": "org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter"
}
},
{
"ams-grafana-env": {
"metrics_grafana_password": "StrongPassword"
}
},
{
"streamline-logsearch-conf": {}
}
]
}
Sample cluster.json for 4 node cluster:
{
"blueprint": "recommended",
"default_password": "hadoop",
"host_groups": [
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x3.us-west-1.compute.internal"
}
],
"name": "host-group-3"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x2.us-west-1.compute.internal"
}
],
"name": "host-group-2"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x4.us-west-1.compute.internal"
}
],
"name": "host-group-4"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x1.us-west-1.compute.internal"
}
],
"name": "host-group-1"
}
]
}
What next? Now that your cluster is up, you can explore what Nifi's Ambari integration means: https://community.hortonworks.com/articles/57980/hdf-20-apache-nifi-integration-with-apache-ambarir.html Next, you can enable SSL for Nifi: https://community.hortonworks.com/articles/58009/hdf-20-enable-ssl-for-apache-nifi-from-ambari.html
... View more
11-17-2017
04:10 AM
4 Kudos
Overview Partner demo kit is built and maintained by the Hortonworks Partner Solutions team. The purpose of the demo kit is to enable the partners to: Quickly bring up a HDP environments with pre-built demos Leverage available demos to understand the capabilities of the platform Use the demos as part of business conversation to demonstrate the art of possible The remainder of this article provides a short description of the 3 demos packaged within the demo kit and step by step instruction on: How to launch the demo kit on AWS or on private cloud How to execute the demos provided with the demo kit Other Versions The Security/Governance Demo kit for HDP 2.6 can be found here The previous version of demo kit (for HDP 2.5) can be found here Pre-requisites When using AWS, you must already have created your Amazon Web Services account. Sample steps for doing this can be found here. If you have an AWS promo code, you can apply it to your account using the steps here. For running the sentiment demo, you must have created a Twitter application using your Twitter account and generated consumer keys/secrets. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Notes Note that the partner demo kit is not a formally supported offering. In case of questions, see ‘Questions?” section at the end of this article. Slides Slides for demo kit are available here Packaged Demos The demo kit comes with 3 demos: 1. IOT demo Purpose: IOT demo showcases how a logistic company uses the Hortonworks Connected Data Platform to monitor its fleet in real time to mitigate driving infractions Use case setup: Sensor devices from trucks capture events of the trucks and actions of the drivers. Some of these driver events are dangerous "events” such as: Lane Departure, Unsafe following distance, Unsafe tail distance The Business Requirement is to stream these events in, filter on violations and do real-time alerting when “lots” of erratic behavior is detected for a given driver over a short period of time. Over time, users would like to do advanced analytics on the full archive of historical events generated by the trucks to: Determine what factors have an impact on driving violations (e.g. weather, driver fatigue etc) Build an AI model to make predictions when violations will occur Technologies used: Apache Nifi, Kafka, Storm, Streaming Analytics Manager, Schema Registry, HBase, Spark, Zeppelin More details available here and here 2. Sentiment demo Purpose: Sentiment demo showcases how a retail company can use the Hortonworks Connected Data Platform to visualize and analyze social media data related to their products Use case setup: The Business Requirement is to capture, process and analyze flow of tweets to understand the social sentiments for their products Technologies used: Apache Nifi, Solr, HDFS More details available here and here 3. Advanced analytics demo Purpose: Advanced analytics demo showcases how an insurance company can use the Hortonworks Connected Data Platform to visualize and make predictions on earthquake data using Apache Spark’s machine learning libraries Use case setup: The Business Requirement is to be able to perform advanced analytics on world wide earthquake data to predict where large earthquakes will happen so the business can accordingly modify insurance premiums Technologies used: Apache Spark, Zeppelin More details here Option #1: Installing the Demo Kit on your own setup You can install Demo Kit on other public or private clouds using the provided automated script. With this option you would launch a CentOS/RHEL 7 VM of the right size on any cloud of your choice (as long as it has access to public internet), and use provided script to install single node HDP and install the demo. For more details see README here. Setup ETA is 1 hour Option #2: Launching the Demo Kit AMI on AWS You can use this option to launch a prebuilt image of single node HDP (including the demo) on AWS cloud. Setup ETA is 15min Steps to launch the AMI 1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab. 2. Select the AMI from ‘N. California’ region by clicking here. Now choose instance type: select ‘m4.2xlarge’ and click Next Note: if you choose a smaller instance type from the above recommendation, not all services may come up 3. Configure Instance Details: leave the defaults and click ‘Next’ 4. Add storage: keep the default of 500 GB and click ‘Next’ 5. Optionally, add a name or any other tags you like. Then click ‘Next’ 6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For long running instances (i.e. anything beyond an hour), a more restrictive security group policy is strongly encouraged (for example: only allow traffic from your company’s IP range). Then click ‘Review and Launch’ 7. Review your settings and click Launch 8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’ 9. Click the shown link under ‘Your instances are now launching’ 10. This opens the EC2 dashboard that shows the details of your launched instance 11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster) . If it is blank, wait 1-2 minutes for this to be populated 12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as admin user using StrongPassword as password 13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done. (Optional) You can also monitor the startup using the log as below: Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> Tail the startup log: tail -f /var/log/hdp_startup.log Once you see “cluster is ready!” you can proceed 14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up. If any services fail to start, use the Actions > Start All button to start 15. At this point you can follow the demo instructions. Troubleshooting If any service does not come up for some reason, you can use Ambari to retry by clicking: ‘Actions’ > ‘Start all’. In case of multiple failures when starting services, use the EC2 dashboard to double check that the correct instance type was used. Insufficient resources can cause services to not start up successfully It is not required to connect via SSH to your instance. But you can do this using the key pair you created/selected earlier by following the standard instructions on AWS website. Make sure the user you login as is centos A log file of the automated startup of HDP services is available under: /var/log/hdp_startup.log Stopping/Terminating demo kit Once you are done with demo kit, we recommend bringing it down to avoid incurring any unnecessary charges. To do this, follow below: First, stop the cluster services using Ambari by clicking: ‘Actions’ > ‘Stop all’. Then pick from one of the two options: a) Terminate the instance: If you do not want to incur any further charges from AWS, terminate the VM instance from the same ‘EC2 dashboard’ that displayed the instance details. Note that this will destroy the VM, so the next time you wish to use demo kit, you will need to follow the same steps outlined in above section ‘Launching the Demo Kit’ b) Stop the instance: if you want to bring down your VM instance but keep it around so you can start it back up in the future, stop the VM instance from the EC2 dashboard. Note that this option will preserve any customizations you make to the VM but you will incur AWS charges by choosing for this option. More details on stop vs terminate operations can be found on AWS website here and here Demo Execution Steps IOT Demo Video recording of the IOT demo Recording of demo provided here (high level) and here (deeper level) PPT and PDF versions of the slides also available IOT Demo setup instructions Sequence to walk through the IOT trucking demo: Events simulator Schema Registry UI NiFi flow SAM Application view Storm Monitoring view Superset Dashboard Superset Slice creation Zeppelin notebook Detailed steps for IOT trucking demo walk through (Optional): Check that events are being simulated. This step is optional because we will also check this from NiFi UI Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> sudo su - To check events being simulated you can either verify the simulator process is running or monitor the simulator log: ps -ef | grep stream-simulator tail -f /tmp/whoville/data_simulator/simulator.log If simulator is not running, you can invoke it by running below from SSH sessioncd /tmp/whoville/data_simulator/sudo ./runDataLoader.sh In case you need to kill the simulator use the ps command above to find the process id and then kill it Next, we will open the web UIs of a number of components that are part of the demo using the Ambari Quicklinks. For example, for Schema Registry here is how to access the Quicklink: Open Schema Registry using Quicklink in Ambari and check 4 schemas below are listed Open NiFi using Quicklink in Ambari, check that “IOT trucking demo” process group is started Double click on the “IOT trucking demo” box to see the details of the flow. The counters should show that simulated events are flowing through the NiFi flow. You can refresh the UI to see this: Open Storm Monitoring view (under Ambari views), and check the topology is live Open SAM using Quicklink in Ambari, check the application is deployed Double click on the application to see more details. You should see that the Emitted and Transferred fields are non-zero (assuming the simulator has been been running for a few min) Open Druid Console using Quicklink in Ambari, check the two datasets are present Open Druid Superset using Quicklink in Ambari and login using admin/StrongPassword There should be one entry under Dashboards. Click it to open the prebuilt dashboard. The prebuilt dashboard will open. You can periodically click the refresh button to see new data arriving. Datasets can take 2-6 mins for new events to appear in Druid The first few slices (i.e graphs) provide monitoring related information (e.g. how many violations? Who are the violators? etc). The last 3 slices provide information about the predictions made by the model (i.e. which routes are predicted to have most violations? Which drivers are predicted to have violations) You can also create other slices and add them to the dashboard using the steps here Optionally you can also demonstrate how a data scientist would use archived truck events to build a model to predict violations. Note, to limit amount of resources needed to run the AMI, Spark/Hive has not been installed so you will not be able to actually run the notebook. The previous version of demokit HDP sandbox has these set up so that can be used if you want to actually execute the steps in the notebook. To walk through the trucking events analysis notebook, first open Zeppelin UI using the Quicklink from Ambari: Login as admin/admin Under Notebook section, use search text field to search for “Trucking data analysis” notebook using Zeppelin search: Click Save on the interpreter binding Walk through the notebook to show how data scientist can use SparkSQL to visualize data to help understand what features should be included in the model Finally you can show that once the important features are known, a model can be built to predict violations (in this case, using Logistical Regression) Stopping/Starting the simulator To stop the simulator, use below command to find its process id and then use kill command to kill it: ps -ef | grep stream-simulator kill <process_id> To start it back up, run below:cd /tmp/whoville/data_simulator/sudo ./runDataLoader.sh Sentiment Demo Video recording of the Sentiment demo Recording of setup instructions for demo provided here Sentiment Demo setup instructions Open Nifi UI using Quicklinks in Ambari Doubleclick "Twitter Dashboard" to open this process group: Right click "Grab Garden Hose" > Properties and enter your Twitter Consumer key/secret and Access token/secret. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Optionally change the 'Terms to filter on' as desired. Once complete, start the flow. Use Banana UI quicklink from Ambari to open Twitter dashboard An empty dashboard will initially appear. After a minute, you should start seeing charts appear Advanced Analytics Demo Video recording of Advanced Analytics demo Video recording provided here Advanced Analytics Demo setup instructions Open Zeppelin UI via Quicklink Login as admin. Password is same as Ambari password A directory structure containing a number of demo notebooks will appear. Find the earthquake demo notebook by filtering for ‘earthquake’ On first launch of a notebook, you will see that the "Interpreter Binding" settings will be displayed. You will need to click "Save" under the interpreter order to accept the defaults. Now you can walk through the notebook and show the visualizations and process of building the model. Note, to limit amount of resources needed to run the AMI, Spark/Hive has not been installed so you will not be able to actually run the notebook. The previous version of demokit or HDP sandbox has the notebook set up so that can be used if you want to actually execute the steps in the notebook. This concludes this article on how to launch the demo kit and access the provided demonstrations Questions? In case of questions or issues: 1. Search on our Hortonworks Community Connection forum. For example, to find all Demo Kit related posts access this url 2. If you were not able to find the solution, please post a new question using the tag “partner-demo-kit” here. Please try to be as descriptive as possible when asking questions by providing: Detailed description of problem Steps to reproduce problem Environment details e.g. Instance type used was m4.2xlarge Storage used was 500gb Etc Relevant log file snippets
... View more
10-06-2016
09:58 AM
2 Kudos
In the previous articles, we showed how to deploy an HDF 2.x/3.0 cluster, enable SSL for Nifi and setup the Ranger Nifi plugin. Here we will build on the same cluster and show how to enable kerberos using Active Directory. Summary To achieve this, the high level steps we will follow are:
Setup certificate trust for HDF nodes Run Ambari security wizard Create Ranger policy for nifiadmin user Delete certificate Login to Nifi using AD principal credentials Pre-requisites You have correctly setup AD as described here
Active Directory setup with domain: CLOUD.HORTONWORKS.COM AD already preconfigured with LDAPS Certificate (.crt) used to enable LDAPS is available OU created where HDF principals will be created hadoop user has permission to write principals to above OU nifiadmin user created in AD (optionally synced over to Ranger)
Test to ensure you can access AD over LDAPS using hadoopadmin user succeeds: ldapsearch -H ldaps://sme-security-ad03.cloud.hortonworks.com:636 -D hadoopadmin@cloud.hortonworks.com -w BadPass#1 Steps 1. Setup trust for all HDF nodes using the AD certificate #run on all HDF nodes before running security wizard using AD
ad_ip=xx.xx.xx.xx ##replace with IP of your AD
cert_url=http://someurl/mycertificate.crt ## replace with location of exported AD certificate
echo "${ad_ip} ad01.lab.hortonworks.net ad01" | sudo tee -a /etc/hosts
sudo yum -y install openldap-clients ca-certificates
#instead of downloading the cert, you could also manually transfer the .cert file to below location
sudo curl -sSL "${cert_url}" -o /etc/pki/ca-trust/source/anchors/hortonworks-net.crt
sudo update-ca-trust force-enable
sudo update-ca-trust extract
sudo update-ca-trust check
# edit /etc/openldap/ldap.conf to include LDAP url and base
sudo tee -a /etc/openldap/ldap.conf > /dev/null << EOF
TLS_CACERT /etc/pki/tls/cert.pem
URI ldaps://ad01.lab.hortonworks.net ldap://ad01.lab.hortonworks.net
BASE dc=cloud,dc=hortonworks,dc=com
EOF
#test using openssl - should return 0
openssl s_client -connect ad01:636 </dev/null
#test using ldapsearch
ldapsearch -H ldaps://sme-security-ad03.cloud.hortonworks.com:636 -D nifiadmin@cloud.hortonworks.com -w BadPass#1 2. Run Ambari Security Wizard Launch security wizard via Ambari (under Admin > Kerberos) and enter below: The ‘Configure Kerberos’ page is the only one you will need to update. Enter the below then click Next on all remaining screens.
KDC host: FQDN of AD Realm name: CLOUD.HORTONWORKS.COM Kadmin host: FQDN of AD node Admin principal: hadoopadmin@cloud.hortonworks.com Password: BadPass#1 On ‘Configure Identities’ page, users will be shown the option to customize the keytabs/principals for all components: The Nifi ones are under Advanced tab: Click Next to proceed using the default keytab/principal names Click Next to proceed through all remaining steps of the wizard. What’s happening to Nifi under the covers when security wizard runs? a) NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari b) Kerberos-related nifi.properties fields will automatically be updated:
NiFi.kerberos.service.principal NiFi.kerberos.keytab.location NiFi.kerberos.krb5.file NiFi.kerberos.authentication.expiration c) Login provider will also be switched to kerberos under the covers d) As part of the process, other HDF components were also kerberized including ‘Ambari Infra’ service. This mean that Ranger audits are now being written to kerberized Solr After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos). At this point, Kerberos security will be enabled for all components running on the cluster: On a node running Nifi, you can verify the keytab was generated and list its principal # klist -kt /etc/security/keytabs/nifi.service.keytab
Keytab name: FILE:/etc/security/keytabs/nifi.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM You can also verify the nifi configs for kerberos were automatically populated: # cat /etc/nifi/conf/nifi.properties | grep kerberos
nifi.kerberos.krb5.file=/etc/krb5.conf
nifi.kerberos.service.keytab.location=/etc/security/keytabs/nifi.service.keytab
nifi.kerberos.service.principal=nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
nifi.kerberos.spnego.authentication.expiration=12 hours
nifi.kerberos.spnego.keytab.location=/etc/security/keytabs/spnego.service.keytab
nifi.kerberos.spnego.principal=HTTP/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
nifi.security.user.login.identity.provider=kerberos-provider You can also verify that the login-identity-provider or Nifi has now been switched to kerberos # tail /etc/nifi/conf/login-identity-providers.xml
<provider>
<identifier>kerberos-provider</identifier>
<class>org.apache.nifi.kerberos.KerberosProvider</class>
<property name="Default Realm">HORTONWORKS.COM</property>
<property name="Authentication Expiration">12 hours</property>
</provider> 3. Login to Nifi UI without certificate Now that kerberos is enabled, lets try to login without using certificate
Make sure nifiadmin user exists in Ranger (if you ran Ranger sync earlier this should have been imported already).
If not, create the user in Ranger by navigating to below url and entering below http://<Ranger_node>:6080/index.html#!/user/create
Create Ranger policy for new user
In Ranger, under ‘Access Manager, click ‘HDF-nifi’
Click Edit button on the /* policy we previously added nifiadmin@CLOUD.HORTONWORKS.COM to
Add the newly created nifiadmin user to the policy, and click Save
Delete previously imported .p12 certificates from your browser
e.g. if using Chrome on OSX you can delete previously imported certificates using ‘Keychain Access’ application
Restart Chrome and open Nifi UI. It should now display a login page
If not, try opening “Incognito Window”
Enter username as nifiadmin and the password you set
The Nifi UI should open now and you will be logged in as that user
You can see who you are logged in as by checking top-right corner of Nifi UI This completes the tutorial. If you made it this far in the series, congratulations! You have successfully:
Deployed HDF 2.0 Enabled SSL for Nifi and explored file-based authorization for Nifi Installed Ranger and switched to Ranger-based authorization for Nifi Enabled kerberos for your HDF cluster using Active Directory Logged into Nifi using AD credentials
... View more
09-28-2016
07:14 AM
3 Kudos
In the previous articles, we showed how to deploy an HDF 2.0 cluster, enable SSL for Nifi and setup the Ranger Nifi plugin. Now we will build on the same cluster and show how to enable kerberos from Ambari using MIT KDC. Summary To achieve this, the high level steps we will follow are: Setup MITC KDC Run Ambari security wizard Create principal for nifiadmin user in KDC Create Ranger policy for nifiadmin user Delete certificate from browser Login to Nifi using KDC principal credentials Steps 1. Setup MIT KDC High level steps to setup KDC: Install KDC rpms Configure KDC (krb5.conf) Create KDC database Start krb5kdc/kadmin services Create admin principal Make user an administrator by adding to kadm5.acl Restart krb5kdc/kadmin services Script to automate KDC setup (run below on Ambari node) export realm=HORTONWORKS.COM
export domain=hortonworks.com
export kdcpassword="BadPass#1"
curl -sSL https://gist.github.com/abajwa-hw/f8b83e1c12abb1564531e00836b098fa/raw | sudo -E sh Test KDC is up by running below on Ambari node: kadmin -p admin/admin -w BadPass#1 -r HORTONWORKS.COM -q "get_principal admin/admin" 2. Run Ambari Security Wizard Launch security wizard via Ambari (under Admin > Kerberos) and enter below: The ‘Configure Kerberos’ page is the only one you will need to update. Enter the below then click Next on all remaining screens. KDC host: FQDN of KDC (Ambari) node Realm name: HORTONWORKS.COM Kadmin host: FQDN of KDC (Ambari) node Admin principal: admin/admin Password: BadPass#1 On ‘Configure Identities’ page, users will be shown the option to customize the keytabs/principals for all components: The Nifi ones are under Advanced tab: Click Next to proceed using the default keytab/principal names Click Next to proceed through all remaining steps of the wizard. What’s happening to Nifi under the covers when security wizard runs? a) NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari b) Kerberos-related nifi.properties fields will automatically be updated: NiFi.kerberos.service.principal NiFi.kerberos.keytab.location NiFi.kerberos.krb5.file NiFi.kerberos.authentication.expiration c) Login provider will also be switched to kerberos under the covers d) As part of the process, other HDF components were also kerberized including ‘Ambari Infra’ service. This mean that Ranger audits are now being written to kerberized Solr After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos). At this point, Kerberos security will be enabled for all components running on the cluster: On a node running Nifi, you can run below commands to: ...verify the keytab was generated and list its principal # klist -kt /etc/security/keytabs/nifi.service.keytab
Keytab name: FILE:/etc/security/keytabs/nifi.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM ...verify the nifi configs for kerberos were automatically populated: # cat /etc/nifi/conf/nifi.properties | grep kerberos
nifi.kerberos.krb5.file=/etc/krb5.conf
nifi.kerberos.service.keytab.location=/etc/security/keytabs/nifi.service.keytab
nifi.kerberos.service.principal=nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
nifi.kerberos.spnego.authentication.expiration=12 hours
nifi.kerberos.spnego.keytab.location=/etc/security/keytabs/spnego.service.keytab
nifi.kerberos.spnego.principal=HTTP/abajwa-hdf-qe-hdfsecured-1.openstacklocal@HORTONWORKS.COM
nifi.security.user.login.identity.provider=kerberos-provider ...verify that the login-identity-provider or Nifi has now been switched to kerberos # tail /etc/nifi/conf/login-identity-providers.xml
<provider>
<identifier>kerberos-provider</identifier>
<class>org.apache.nifi.kerberos.KerberosProvider</class>
<property name="Default Realm">HORTONWORKS.COM</property>
<property name="Authentication Expiration">12 hours</property>
</provider> 3. Login to Nifi UI without certificate Now that kerberos is enabled, lets try to login without using certificate First create a principal in KDC for nifiadmin. From the node running KDC (same one as Ambari) run below and enter your desired password (e.g. BadPass#1): kadmin.local -q "addprinc nifiadmin" Create the user in Ranger by navigating to below url and entering below http://<Ranger_node>:6080/index.html#!/user/create The username should be in the format userprinc@KDC_REALM (e.g. nifiadmin@HORTONWORKS.COM) Create Ranger policy for new user
In Ranger, under ‘Access Manager, click ‘HDF-nifi’ Click Edit button on the policy we previously added nifiadmin@HORTONWORKS to Add the newly created nifiadmin@HORTONWORKS.COM’ user to the policy, and click Save Delete previously imported .p12 certificates from your browser
e.g. if using Chrome on OSX you can delete previously imported certificates using ‘Keychain Access’ application Restart Chrome and open Nifi UI. It should now display a login page
If not, try opening “Incognito Window” Enter username as nifiadmin@HORTONWORKS.COM and the password you set The Nifi UI should open now and you will be logged in as that user You can see who you are logged in as by checking top-right corner of Nifi UI This completes the tutorial. If you made it this far in the series, congratulations! You have successfully: Deployed HDF 2.0 Enabled SSL for Nifi and explored file-based authorization for Nifi Installed Ranger and switched to Ranger-based authorization for Nifi Enabled kerberos for your HDF cluster Logged into Nifi using KDC credentials
... View more
09-28-2016
03:55 AM
@Sunile Manjee thanks!
... View more
09-28-2016
02:49 AM
8 Kudos
In the previous article, we showed how to enable SSL and set up identity mappings for Apache Nifi on the previously installed HDF 2.x or 3.0 cluster. Here, we will build on the same cluster and show how to install Apache Ranger and setup the Ranger Nifi plugin. For simplicity, we will assume this is a demo environment where there is no requirement to enable SSL for Ranger. If instead you would like to use secured Ranger with NiFi, follow steps here Summary At a high level, Apache Ranger provides a centralized platform to define, administer and manage security policies consistently across Hadoop components. In the case of HDF, it enables the administrator to create/manage authorization policies for Kafka, Storm and Nifi from the same web interface (or REST APIs). To achieve this, the high level steps we will follow are:
Ranger install prerequisites Ranger install Update Nifi Ranger repo Test Ranger plugin Create Ranger uses and policies Test Nifi access as nifiadmin user The official documentation for this can be found here Tested with HDF 2.x and 3.0 Step Details 1. Ranger install prerequisites: a) Make sure Logsearch or external Solr is installed/running before installing Ranger (used to store audits) In our case, we had deployed the cluster with Logsearch so will use that option b) Configure RDBMS for Ranger (used to store policies) in our case we will use the same PostGres used by Ambari. So from the Ambari node, run below: ranger_user=rangeradmin #set this to DB user you wish to own Ranger schema in RDBMS
ranger_pass=BadPass#1 #set this to password you wish to use
yum install -y postgresql-jdbc*
chmod 644 /usr/share/java/postgresql-jdbc.jar
echo "CREATE DATABASE ranger;" | sudo -u postgres psql -U postgres
echo "CREATE USER ${ranger_user} WITH PASSWORD '${ranger_pass}';" | sudo -u postgres psql -U postgres
echo "ALTER DATABASE ranger OWNER TO ${ranger_user};" | sudo -u postgres psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE ranger TO ${ranger_user};" | sudo -u postgres psql -U postgres
sed -i.bak s/ambari,mapred/${ranger_user},ambari,mapred/g /var/lib/pgsql/data/pg_hba.conf
cat /var/lib/pgsql/data/postgresql.conf | grep listen_addresses
#make sure listen_addresses='*'
ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar
service ambari-server stop
service postgresql restart
service ambari-server start 2. Ranger install: Start Ambari ‘Add service’ wizard and select Ranger and choose any host to install install Ranger components on. a) On the configuration screen there are few things to set:On ‘Ranger Admin’ tab set below and run ‘Test connection’:
Db flavor: POSTGRES Host: FQDN of Ambari node Database Administrator (DBA) username: rangeradmin Passwords: BadPass#1 b) ‘Ranger User Info’ tab is where you would optionally configure Ranger to pull users from Active Directory or LDAP (see here for sample steps on how we setup our AD)
‘Common configs’ sub-tab
‘User configs’ sub-tab c) On ‘Ranger Plugin’ tab enable plugins for Nifi, Storm, Kafka. (Note the plugins for Storm/Kafka will not be enabled until kerberos is enabled on the cluster) d) On ‘Ranger Audit’ tab provide Solr details. In our case, since Logsearch/Ambari_infra components were installed, just turn on SolrCloud - Ambari will autodetect the Zookeeper string e) On ‘Ranger Tagsync’ tab, no changes needed f) On ‘Advanced’, no changes needed. If you wanted to setup ability to use AD/LDAP credentials to log into Ranger you can configure this (and other advanced features) here g) Click Next > Proceed Anyway > Deploy to start the Ranger install and wait for it to complete h) Once installed, Ambari will show that Storm, Kafka, Nifi need to be restarted. Use the “Restart All Required” button (new in Ambari 2.4) to do this: 3. Update Nifi Ranger repo: This is needed to enable auto-completion when creating policies in Ranger for Nifi. Note that if this step is skipped, Ranger plugin will still work as usual - it just impacts lookups when creating Nifi policies from the Ranger web interface. If SSL for Ranger will not be setup, you should consider just skipping this step. To access the Nifi repo in Ranger: a) Open Ranger using Quicklink in Ambari b) In Ranger > Access Manager > Nifi > click Edit icon c) Notice most of the configs are empty. If you try a test connect, it will fail with below: d) On Ranger host, run below to find the keystore/truststore details (like path, type and password) cat /usr/hdf/current/nifi/conf/nifi.properties | grep nifi.security e) Ensure ranger user can access key/truststores by running below on Ranger host: chmod o+r /usr/hdf/current/nifi/conf/keystore.jks /usr/hdf/current/nifi/conf/truststore.jks Note: in secure environments Ranger should not access Nifi keystore/truststore: there should be a separate keystore/truststore for Ranger to use as part of enabling SSL for it. Also note that these files could be re-generated by Nifi CA, resetting the permissions f) Update as shown :
Keystore path, type, password Truststore path, type, password g) Now the test connection should return 403 error: This is an authorization error from Ranger (since we have not yet created any policies). Click Save to commit the changes you made 4. Test Ranger plugin Attempting to open Nifi UI results in "Access denied" due to insufficient permissions: Navigate to the ‘Audit’ tab in Ranger UI and notice that the requesting user showed up on Ranger audit. This shows the Ranger Nifi plugin is working Notice how Ranger is showing details such as below for multiple HDF components: what time access attempt occurred user/IP who attempted the access resource that was attempted to be accessed whether access for allowed or denied Also notice that Nifi now shows up as one of the registered plugins (under ‘Plugins’ tab) 5. Create Ranger uses and policies To be able to access the Nifi U we will need to create a number of objects. The details below assume you have already setup identity mappings for Nifi (as described in previous article), but you should be able follow similar steps even if you have not. i) Ranger users for admin and node identities (in a real customer env, you would not be manually creating these: they would be synced over from Active Directory/LDAP)
nifiadmin@CLOUD.HORTONWORKS.COM node1.fqdn@CLOUD.HORTONWORKS.COM node2.fqdn@CLOUD.HORTONWORKS.COM node3.fqdn@CLOUD.HORTONWORKS.COM ii) Read policy on /flow for node1-3 identities iii) Read/write policy on /proxy for node1-3 identities iv) Read/write policy on /data/* for node1-3 identities (needed to list/delete queue) v) Read/write policy on /* for nifiadmin identity (needed to make nifiadmin an admin) More details on what Ranger policies to create can be found here Option 1: Run script below from Ranger node to create above using Ranger’s REST APIs export hosts="node1.fqdn node2.fqdn node3.fqdn" #set hostnames of nodes running Nifi
export admin="nifiadmin" #set your desired Nifi admin user
export realm="CLOUD.HORTONWORKS.COM" #set domain of certificate
export cluster="HDF" #set cluster name
#download/run script
curl -sSL https://gist.github.com/abajwa-hw/2b59db1a850406616d4583f44bad0a78/raw | sudo -E sh End result: Option 2: Manually create users and policies Create local users in Ranger for all requesting users from Ranger UI under Settings > Users/Groups Assuming you setup identity mapping earlier, create the users appropriately e.g. node-1.fqdn@CLOUD.HORTONWORKS.COM,node-2.fqdn@CLOUD.HORTONWORKS.COM,node-3.fqdn@CLOUD.HORTONWORKS.COM Alternatively if you do not wish to use node identities, you would enter the long form of the identity as the username (e.g. CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM; CN=node-1.fqdn, OU=CLOUD.HORTONWORKS.COM; CN=node-1.fqdn, OU=CLOUD.HORTONWORKS.COM; CN=node-1.fqdn, OU=CLOUD.HORTONWORKS.COM) Now create Ranger policies for node identities for each host:
/flow - read /proxy - read/write /data/* - read/write To do this, access Nifi policies in Ranger by navigating to Ranger > Access Manager > Nifi > HDF_nifi Then click ‘Add New Policy’ to display below form: Create a new READ policy for node identities on /flow: Similarly, create a new READ/WRITE policy for node identities on /proxy: Similarly create a new READ/WRITE policy for node identities on /data/*: We still need to manually add nifiadmin user to the global policy To do this, click the ‘HDF_nifi’ link highlighted: Then click Edit icon on “all-nifi-resource” policy: Under ‘Select User’ add nifiadmin@CLOUD.HORTONWORKS.COM and provide Read/Write access, then Save. 6. Test Nifi acess as nifiadmin user Whether you created the users/policies via the script or manually, at this point the Nifi policy page should appear as below: Note that it may take up to 30s after creating the policies in Ranger UI for them to take affect. How to confirm that the new policies were downloaded by the Nifi Ranger plugin after they we created them? You can do this by checking the first ‘Export Date’ for nifi service under Audit > Plugins tab in Ranger: when this timestamp shows a time after the changes were made, it means the new policies have been downloaded and should be in effect. Open Nifi UI via Quicklink and confirm it now opens. Confirm via Ranger audits that Ranger now allows access With this you have successfully installed Ranger and configured Nifi to use the Ranger authorizer Other things to try: Try disabling the policies created one by one, waiting 30s and refreshing Nifi UI to see what breaks. Tips: 1. To authorize separate users/group access to different parts of a flow, implement multiple process groups and then: Grant user/group access to modify the NiFi flow with a policy for /process-groups/<root-group-id> with RW Create a separate a policy for /provenance/process-groups/<root-group-id> (with each of the cluster node DNs) for read access 2. Troubleshooting tip: When the Ranger plugin is enabled and you are encountering permission errors trying to login to Nifi or performing a certain action within Nifi, check Ranger audits to check for any 'Denied' requests. In the event that you encounter these, Ranger will tell you exactly what user was trying to access what resource which will help you create the right policy to avoid the issue What next? If you haven't already, review what Ranger policies you can create for Nifi here: https://community.hortonworks.com/articles/60842/hdf-20-defining-nifi-policies-in-ranger.html Next, we will enable kerberos on the cluster and show how users can then login to Nifi without certificates (using AD/KDC credentials): Steps for enabling security on HDF using Active directory:
https://community.hortonworks.com/articles/60186/hdf-20-use-ambari-to-enable-kerberos-for-hdf-clust-1.html Steps for enabling security on HDF using KDC:
https://community.hortonworks.com/articles/58793/hdf-20-use-ambari-to-enable-kerberos-for-hdf-clust.html
... View more
09-26-2016
06:47 AM
8 Kudos
Summary:
Automation/AMI to install HDP 2.5.x with Nifi 1.1.0 on any cloud and deploy commonly used demos via Ambari blueprints
Currently supported demos:
Nifi-Twitter
IoT (trucking) demo
Zeppelin notebooks
Vanilla HDF 2.1 (w/o any demos) Option 1: Deploy single node instances using AMIs
1. For deploying the above on single node setups on Amazon, AMI images are also available. To launch an instance using one of the AMIs, refer to steps below. A video that shows using these steps to launch the HDP 2.5.3 AMI is available here.
Login into EC2 dashboard using your credentials
Change your region to "N. California"
Click 'Launch instance'
Choose AMI: search for 081339556850 under Community AMIs (as shown in screenshot), select the desired AMI. For the HDP 2.5.x version of the AMI that has the demos pre-installed, select "HDP 2.5 Demo kit cluster" Choose instance type: select m4.2xlarge for HDP AMIs or m4.xlarge for HDF
Configure instance: leave defaults
Add storage: 100gb or larger (500gb preferred)
Tag: name your instance and add any tags you like
Configure Security Group: choose security group that opens all the ports (e.g. sg-1c53d279summit2015) or create new
While deploying choose an SSH key you have the .pem file for or create new
2. Once the instance comes up and Ambari server/agent are fully up, it will automatically start the services. You can monitor this by connecting to your instance via
SSH as ec2-user and tailing /var/log/hdp_startup.log
3. Once the service start call was made, you can login to Ambari UI (port 8080) to monitor progress. Note: if Ambari is not accessible make sure a) the security group you used has a policy for 8080 b) you waited enough time for Ambari to come up.
The password for 'admin' user of Ambari and Zeppelin is defaulted to your AWS account number. You can look this up using your EC2 dashboard as below
3. So 15-20 min after AWS shows the instance came up, you should see a fully started cluster. Note: in case any service does not come up, you can bring it up using 'Service Actions' menu in Ambari
Notes:
Once the cluster is up, it is recommended that you change the Ambari and Zeppelin admin passwords
The instance launched is EBS backed - so the VM can be stopped when not in use and restarted when needed. Just make sure to stop all HDP/HDF services via Ambari before stopping the instance via EC2 dashboard. What gets installed?
HDP 2.5.x with below vanilla components
IotDemo demo service - allows users to stop/start Iot Demo, open webUI and generate events
Demo Ambari service for Solr
This service will pre-configure Solr/Banana for Twitter demo
Demo Ambari service for Nifi 1.1
The script auto-deploys the specified flow - by default, it deploys the the Twitter flow but this is overridable
Even though the flow is deployed, you will need to set processors that contain env-specific details e.g. you will need to enter Twitter key/secret in GetTwitter processor
IoT Trucking demo steps Once the instance is up, you can follow the below steps to start the trucking demo. Video here - In Ambari, open 'IotDemo UI' using quicklink:
- In IotDemo UI, click "Deploy the Storm Topology"
- After 30-60 seconds, the topology will be deployed. Confirm using the Storm View in Ambari:
- Click "Truck Monitoring Application" link in 'IotDemo UI' to open the monitoring app showing an empty map.
- Click 'Nifi Data Flow' in In IotDemo UI to launch Nifi and then double click on 'Iot Trucking demo' processor group. Then right click on both PublishKafka_0_10 processors > Configure > Properties. Confirm that the 'Kafka Broker' hostname/port is correctly populated. The flow should already be started so no other action needed.
- In Ambari, click "Generate Events" to simulate 50 events (this can be configured)
- Switch back to "Truck Monitoring Application" in IotDemo UI and after 30s the trucking events will appear on screen
- Explore Storm topology using Storm View in Ambari
Nifi Sentiment demo Next you can follow the below steps to start the Nifi sentiment demo. Video of these steps available here
- Open Nifi UI using Quicklinks in Ambari
- Double click "Twitter Dashboard" to open this process group:
- Right click "Grab Garden Hose" > Properties and enter your Twitter Consumer key/secret and Access token/secret. Optionally change the 'Terms to filter on' as desired. Once complete, start the flow.
- Use Banana UI quicklink from Ambari to open Twitter dashboard
- An empty dashboard will initially appear. After a minute, you should start seeing charts appear
Zeppelin demos
- Open Zeppelin UI via Quicklink
- Login as admin. Password is same as Ambari password
- Demo notebooks will appear. Open the first notebook and walk through each cell.
Option 2: To install HDP (including demos) or HDF using scripts
Pre-reqs:
One or more freshly installed CentOS/RHEL 6 or 7 VMs on your cloud of choice
Do not run this script on VMs running an existing HDP cluster or sandbox
If planning to install ‘IoT Demo’ make sure you allocate enough memory - especially if also deploying other demos
16GB or more of RAM is recommended if using single node setup
The sample script should only be used to create test/demo clusters
Default password for Ambari and Zeppelin admin users is BadPass#1
Override by exporting ambari_password prior to running the script
Steps:
1. This step is only needed if installing a multi-node cluster. After choosing a host where you would like Ambari-server to run, first prepare the other hosts. Run this on all hosts
where Ambari-server will not be running to run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of ambari-server host>
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2. Run remaining steps on host
where Ambari-server is to be installed. These run pre-reqs and install Ambari-server and deploy demos requested
a)
To install HDP 2.5.x (Ambari 2.4.1/Java 😎 - including Solr/Nifi 1.0.0 via Ambari and deploy a Nifi flow:
export host_count=1 #set to number of nodes in your cluster (including Ambari-server node)
export hdp_ver=2.5
export install_nifidemo=true
export install_iotdemo=true
curl -sSL https://gist.github.com/abajwa-hw/3f2e211d252bba6cad6a6735f78a4a93/raw | sudo -E sh
After 5-10 min, you should get a message saying the blueprint was deployed. At this point you can open Ambari UI (port 8080) and monitor the cluster install
Note: if you installed iotdemo on a multi-node cluster, there maybe some manual steps required (e.g. moving storm jars or setting up latest Storm view). See here for more info: https://github.com/hortonworks-gallery/iotdemo-service/tree/hdp25#post-install-manual-steps
b)
To install HDP 2.4 (Ambari 2.4.1/java 😎 - including IoTDemo, plus Solr/Nifi 1.0.0 via Ambari and deploy Nifi Twitter flow run below:
export host_count=1 #set to number of nodes in your cluster (including Ambari-server node)
export hdp_ver=2.4
export install_iotdemo=true
export install_nifidemo=true
curl -sSL https://gist.github.com/abajwa-hw/3f2e211d252bba6cad6a6735f78a4a93/raw | sudo -E sh
c)
To install vanilla HDF 2.1 cluster, you can use the script/steps below:
https://community.hortonworks.com/articles/56849/automate-deployment-of-hdf-20-clusters-using-ambar.html
Note this does not install any of the demos, just a vanilla HDF 2.1 cluster
Deployment
After 5-10min, you should get a message saying the blueprint was deployed. At this point you can open Ambari UI (port 8080) and monitor the cluster install. (Note make sure the port was opened). Default password is BadPass#1
What gets installed?
refer to previous 'What gets installed' section
... View more
Labels:
09-23-2016
06:52 AM
14 Kudos
In the previous article, we showed how to deploy a cluster running HDF 2.x or 3.x. Here we will look into enabling SSL for Apache Nifi on the cluster setup previously and optionally setup identity mappings. This approach also sets up users/authorizations using Nifi's file-based authorizer (as opposed to Ranger based authorizer). Tested with HDF 2.x, 3.0, 3.2 1. Configure Nifi for SSL There are 2 options for configuring SSL for Apache Nifi via Ambari: i). Use Nifi CA to generate self-signed certificates (good for quick start/demos) ii). Use existing certificates (used in production envs) Option i) - Use Nifi Certificate Authority (CA) to generate self-signed certificates: Assuming Nifi CA is already installed (via Ambari when you installed NiFi), you can make the below config changes in Ambari under Nifi > Configs > “Advanced nifi-ambari-ssl-config” and click Save to commit the changes:
a) Enable SSL? Check box b) Clients need to authenticate? Check box c) NiFi CA Token - Set this to long, random value (at least 16 chars) but make sure you remember what it is set to d) Initial Admin Identity - set this to the long form (full DN) of identity for who your nifi admin user should be e.g. CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM (note the space after the comma) e) Node Identities - set this to the long form (full DN) of identity for each node running Nifi (replace CN entries below with FQDNs of nodes running Nifi...also note the space after the comma) e.g. <property name="Node Identity 1">CN=node1.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
<property name="Node Identity 2">CN=node2.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
<property name="Node Identity 3">CN=node3.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
Tip: By default the node identities are commented out using <!-- and --> tags. As you are updating this field, make sure you remove these or you changes will not take affect.
f) NiFi CA DN suffix - in case you are not using OU=NIFI then you need to change this too (note the space after the comma) e.g. , OU=CLOUD.HORTONWORKS.COM
g) (Optional) You may also choose to set Identity Mapping properties at this time. These are used to normalize identities for better integration with LDAP/AD (e.g. if you wish to login as nifiadmin@CLOUD.HORTONWORKS.COM instead of CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM). Let's skip this for now...step #6 (see end of this article) is provided to show how we can switch to using these later on in the process. Summary of above changes: Note on identity fields above: These are not needed to be set if you plan to use Ranger authorizer. But if you plan on logging into the Nifi UI before enabling Ranger you will need to set these. When setting these, you must make sure that on all the nodes, authorizations.xml do not contain any policies. On initial install they should already have no policies, but for example, if you made a mistake setting these first time around and want to modify the values, for the new values to take effect you will need to delete authorizations.xml on all the nodes before restarting Nifi). You can find authorizations.xml under /var/lib/nifi/conf by default (this location can be configured by ‘Nifi internal config dir’). Troubleshooting node identities: How will you know you made a mistake while setting node identities? Usually if the node identities field was not correctly set, when you attempt to open the Nifi UI, you will see an untrusted proxy error similar to below: You will see some a similar 'Untrusted proxy' error in /var/log/nifi/nifi-user.log: [NiFi Web Server-172] o.a.n.w.s.NiFiAuthenticationFilter Rejecting access to web api: Untrusted proxy CN=tsys-nifi0.field.hortonworks.com, OU=NIFI In the above case, you would need to double check that the 'Node identity' values you provided in Ambari match the one from the log file (e.g. CN=tsys-nifi0.field.hortonworks.com, OU=NIFI) and ensure the values are not commented out. Next, you would manually delete /var/lib/nifi/conf/authorizations.xml from all nodes running Nifi and then restart Nifi service via Ambari. Notes on Nifi CA: If you already enabled ssl and wanted to change OU (or wanted to move CA to different node) you can force regeneration of the certificates by either checking “NiFi CA Force Regenerate” checkbox or changing the passwords If you had previously were not using CA and had set the passwords, but now wanted to start using CA, you can clear the passwords and check the “NiFi CA Force Regenerate” checkbox Option ii) - Use existing certificates: First manually copy certificates to all nodes running Nifi (e.g. under /usr/hdf/current/nifi/conf), then make the below config changes in Ambari under Nifi > Configs > “Advanced nifi-ambari-ssl-config” and and click Save to commit the changes:
a) Enable SSL? Check box b) Clients need to authenticate? Check box c) Set Keystore and Truststore path e.g. {{nifi_config_dir}}/keystore.jks d) Set Keystore and Truststore type e.g. JKS e) Set Keystore and Truststore passwords f) Initial Admin Identity - set this to the long form (full DN) of identity for who your nifi admin user should be e.g. CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM g) Node Identities - set this to the long form (full DN) of identity for each node running Nifi (replace nodeX.fdqn with FQDNs of nodes running Nifi) e.g. <property name="Node Identity 1">CN=node1.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
<property name="Node Identity 2">CN=node2.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
<property name="Node Identity 3">CN=node3.fqdn, OU=CLOUD.HORTONWORKS.COM</property>
h) (Optional) You may also choose to set Identity Mapping properties at this time. Step #6 (see end of this article) is provided to show how we can switch to using these later on in the process. 2. Enable SSL for Nifi For both options, once the above changes have been made, Ambari will prompt you to restart Nifi. After restarting, it may take a minute for Nifi UI to come up. You can track the progress by monitoring nifi-app.log. You can do this by either tailing the log via SSH or using Logsearch: tail -f /var/log/nifi/nifi-app.log Another option is to run Nifi service check from Ambari. It will keep checking if the UI came up until it does: 3. Generate client certificate In order to login to SSL-enabled Nifi, you will need to generate a client certificate and import into your browser. If you used the CA, you can use tls-toolkit that comes with Nifi CA: First run below from Ambari node to install the toolkit: wget http://localhost:8080/resources/common-services/NIFI/1.0.0/package/archive.zip
unzip archive.zip Then run below to generate keystore. You will need to pass in your values for :
-D : pass in your “Initial Admin Identity” value -t: pass in your “CA token” value. -c: pass in the hostname of the node where Nifi CA is running: export JAVA_HOME=/usr/java/default
./files/nifi-toolkit-*/bin/tls-toolkit.sh client -c <nifi_CA_host.fqdn> -D 'CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM' -p 10443 -t hadoop -T pkcs12 If you pass in the wrong password, you will see an error like: Service client error: Received response code 403 with payload {"hmac":null,"pemEncodedCertificate":null,"error":"forbidden"} Before we can import the certificate, we will need to find the password to import. To do this, run below: cat config.json | grep keyStorePassword (Optional) - The password generated above will be a long randomly generated string. If you want to change this password to one of your choosing instead, first run the below to remove the keystore/truststore: rm -f keystore.pkcs12 truststore.pkcs12 Then edit config.json by modifying the value of “keyStorePassword" to your desired password vi config.json Then re-run tls-toolkit.sh as below: ./files/nifi-toolkit-*/bin/tls-toolkit.sh client -F At this point the keystore.pkcs12 has been generated. Rename it to keystore.p12 and transfer it (e.g. via scp) over to your local laptop. mv keystore.pkcs12 keystore.p12 . 4. Import certificate to your browser The exact steps depend on your OS and browser. For example if using Chrome on Mac, use “Keychain Access” app: File > Import items > Enter password from above (you will need to type it out) For Firefox example see here 5. Check Nifi access Now you open Nifi UI using the Quicklink in Ambari. After selecting the certificate you imported earlier, follow the below screens to get through Chrome warnings and access the Nifi UI: a) Select the certificate you just imported b) Choose "Always Allow" c) Since the certificate was self-signed, Chrome will warn you that the connection is not private. Click "Show Advanced" and click the "Proceed to <hostname>" link d) At this point, the Nifi UI should come up. On the left, it shows 3/3, meaning all three of the Nifi nodes have joined the cluster. Note that on the top right, it shows you are logged in as "CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM" e) The /var/log/nifi/nifi-user.log log file will also confirm the user you are getting logged in as: o.a.n.w.s.NiFiAuthenticationFilter Authentication success for CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM f) Notice also that users.xml and authorizations.xml were created. Checking their content reveals that Nifi auto-created users and access policies for the 'Initial Admin Identity' and 'Node Identities'. More details on these files can be found here cat /var/lib/nifi/conf/users.xml
cat /var/lib/nifi/conf/authorizations.xml With this you have successfully enabled SSL for Apache Nifi on your HDF cluster 6. (Optional) Setup Identity mappings (Optional) If desired, we can also setup the Identity mappings to try that option as well.
First let's remove the authorization.xml on all nifi nodes to force Nifi to re-generate them. Without doing this, you will encounter an error at login saying: "Unable to perform the desired action due to insufficient permissions" rm /var/lib/nifi/conf/authorizations.xml
Now make the below changes in Ambari under Nifi > Configs and click Save. (Tip: Type .dn in the textbox to Filter the fields to easily find these fields)
nifi.security.identity.mapping.pattern.dn = ^CN=(.*?), OU=(.*?)$ nifi.security.identity.mapping.value.dn = $1@$2
From Ambari, restart Nifi and wait for the Nifi nodes to join back the cluster After about a minute, refresh the Nifi UI and notice now you are logged in as nifiadmin@CLOUD.HORTONWORKS.COM instead
Opening /var/log/nifi/nifi-user.log confirms this: o.a.n.w.s.NiFiAuthenticationFilter Authentication success for nifiadmin@CLOUD.HORTONWORKS.COM
Opening users.xml, authorizations.xml shows that this time Nifi auto-created users and access policies for the 'Initial Admin Identity' and 'Node Identities' in both unmapped (e.g. CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM) and mapped (e.g. nifiadmin@CLOUD.HORTONWORKS.COM) formats: # cat /var/lib/nifi/conf/users.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tenants>
<groups/>
<users>
<user identifier="60911b91-233d-33fe-abe9-b832d8fb06fc" identity="nifiadmin@CLOUD.HORTONWORKS.COM"/>
<user identifier="dbbad79e-b7d8-30a7-963a-d152f1953343" identity="abajwa-hdf-dev-rhel7-1.openstacklocal@CLOUD.HORTONWORKS.COM"/>
<user identifier="7ee92ccf-b548-3de3-b74e-27c1e6e280ab" identity="abajwa-hdf-dev-rhel7-2.openstacklocal@CLOUD.HORTONWORKS.COM"/>
<user identifier="6ce282f4-9da7-31d4-8733-138364d88261" identity="abajwa-hdf-dev-rhel7-3.openstacklocal@CLOUD.HORTONWORKS.COM"/>
<user identifier="e3c6593b-8ab7-3e50-9778-dd662635aa8f" identity="CN=nifiadmin, OU=CLOUD.HORTONWORKS.COM"/>
<user identifier="bb834fc7-7232-3c4d-821e-3e07731100e4" identity="CN=abajwa-hdf-dev-rhel7-3.openstacklocal, OU=CLOUD.HORTONWORKS.COM"/>
<user identifier="6c6c1c9c-90b9-3fc1-9c4f-83db69f9d2b6" identity="CN=abajwa-hdf-dev-rhel7-2.openstacklocal, OU=CLOUD.HORTONWORKS.COM"/>
<user identifier="c5bb13c1-79bc-3c9e-98bc-1593985f7fd1" identity="CN=abajwa-hdf-dev-rhel7-1.openstacklocal, OU=CLOUD.HORTONWORKS.COM"/>
</users>
</tenants>
With this we have completed the setup of Identity mappings with SSL enabled Nifi What to try next?
Configuring other users and access policies (i.e. continuing to use Nifi's file based authorizer):
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#config-users-access-policies Setup Ranger and configure Nifi Ranger plugin (i.e. switching to using Nifi's Ranger authorizer): Using unsecure Ranger:
https://community.hortonworks.com/articles/58769/hdf-20-enable-ranger-authorization-for-hdf-compone.html Or Using secure Ranger:
https://community.hortonworks.com/articles/60001/hdf-20-integrating-secured-nifi-with-secured-range.html
... View more
Labels:
09-23-2016
02:53 AM
15 Kudos
Highlights of integrating Apache NiFi with Apache Ambari/Ranger
Article credits: @Ali Bajwa, @Bryan Bende, @jluniya, @Yolanda M. Davis, @brosander
With the recently announced HDF 2.0, users are able to deploy an HDF cluster comprised of Apache NiFi, Apache Storm, Apache Kafka and other components. The mechanics of setting this up using Apache Ambari’s Install Wizard are outlined in the official documentation here and sample steps to automate the setup via Ambari blueprints are provided here. The goal of this article is to highlight some features NiFi administrators can leverage when using Ambari managed HDF 2.0 clusters vs using NiFi standalone
The article is divided into sections on how the integration helps administrators with HDF:
Deployment
Configuration
Monitoring
Security
Ease of Deployment
Users have the choice of deploying NiFi through Ambari install wizard or operationalize via blueprints automation
(For detailed steps, see links provided on above line)
Using the wizard, users can choose which nodes NiFi should be installed on. So users can:
Either choose NiFi hosts at time of cluster install
...OR Add NiFi to existing host after the cluster is already installed and then start it. Note that in this case, ‘Zookeeper client’ must be installed on a host first before NiFi can be added to it
Ambari also allows users to configure which user/group NiFi runs as. This is done via the Misc tab which is editable either when cluster installed or when NiFi service is added to existing cluster for the first time.
Starting Ambari 2.4, users can also remove NiFi service from Ambari, but note that this does not remove the bits from the cluster.
NiFi can be stopped/started/configured across the cluster via both Ambari UI and also via Ambari’s REST API’s
The same can be done on individual hosts:
For easy access to NiFi UI, quick links are available. The benefit of using these is that the url is dynamically determined based on which users settings (e.g. what ports were specified and whether SSL enabled)
Ease of Configuration
Ambari allows configurations to be done once across the cluster. This is time saving because when setting up NiFi standalone, users need to manage configuration files on each node NiFi is running on
Most important NiFi config files are exposed via Ambari and are managed there (e.g. NiFi.properties, bootstrap.conf etc)
When going through the configuration process, there are a number of ways Ambari provides assistance for the admin:
Help text displayed, on hover, with property descriptions
Checkboxes instead of true/false values
User friendly labels and default values
‘Computed’ values can be automatically handled (e.g. node address)
NiFi benefits from other standard Ambari config features like:
Update configs via Ambari REST API
Configuration history is available meaning that users can diff versions and revert to older version etc
Host-specific configurations can be managed using ‘Config groups’ feature where users can:
‘override’ a value (e.g. max mem in the screenshot) and
create a subset group of hosts that will use that value
‘Common’ configs are grouped together and exposed in the first config section (‘Advanced NiFi-ambari-config’) to allow configuration of commonly used properties:
Ports (nonSSL, SSL, protocol)
Initial and max memory (Xms, Xmx)
Repo default dir locations (provenance, content, db, flow file)
‘Internal’ dir location - contains files NiFi will write to
‘conf’ subdir for flow/tar.gz, authorizations.xml
‘state’ subdir for internal state
Can change subdir names by prefixing the desired subdir name with ‘{NiFi_internal_dir}/’
Sensitive property key (used to encrypt sensitive property values)
Zookeeper znode for NiFi
Contents of NiFi.properties are exposed under ‘Advanced NiFi-properties’ as key/value pairs with helptext
Values replaced by Ambari shown surrounded by double braces e.g.{{ }} but can be overridden by end user
Properties can be updated or added to NiFi.properties via ‘Custom NiFi-properties’ and will get written to all nodes
It also handles properties whose values need to be ‘computed’ e.g.
‘Node address’ fields are populated with each hosts own FQDN
Conditional logic handled:
When SSL enabled, populates NiFi.web.https.host/port
When SSL disabled, populates NiFi.web.http.host/port
Other property-based configuration files exposed as jinja templates (large text box)
Values that will be replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden by end user
Properties can be added/updated in the template and will get written to all nodes
Other xml based config files also exposed as jinja templates
Values replaced by Ambari shown surrounded by double braces e.g. {{ }} but can be overridden
Elements can be updated/added and will get written to all nodes
Note that config files written out with either 0400 or 0600 permissions
Why? Because some property files contain plaintext passwords
Ease of Debugging
Logsearch integration is included for ease of visualizing/debugging NiFi logs w/o connecting to system e.g. NiFi_app.log, NiFi_user.log, NiFi_bootstrap.log
Note: Logsearch component is Tech Preview in HDF 2.0
By default, monitors FATAL,ERROR,WARN messages (for all HDF services)
Can view/drill into errors at component level or host level
Can filter errors based on severity (fatal, error, warn, info, debug, trace)
Can exclude ‘noisy’ messages to find the needle in the haystack
Can ‘tail’ log from Logsearch UI
By clicking the ‘refresh’ button or ‘play’ button (to auto refresh every 10s)
Ease of Monitoring
NiFi Service check: Used to ensure that the NiFi UI has come up after restart. It can also be invoked via REST API for automation
NiFi alerts are host-level alerts that let admins know when a NiFi process goes down
Can temporarily be disabled by turning on maintenance mode
Alerts tab in Ambari allows users to disable or configure alerts (e.g. changing polling intervals)
Admins can choose to notifications email or SNMP through the alerts frameworks
AMS (Ambari Metrics) integration
When NiFi is installed via Ambari, an Ambari reporting task is auto-created in NiFi, pointing to the cluster’s AMS collector host/port (autodetected)
How is the task autocreated? By providing a configurable initial flow.xml (which can also be used to deploy any flows you like when NiFi is deployed) …..
...and passing arguments (like AMS url) via bootstrap.conf. Advantage of doing it this way: if the collector is ever moved to a different host in the cluster, Ambari will let NiFi know (next time NiFi is restarted after the move)
As a result of the metrics integration, users get a dashboard for NiFi metrics in Ambari, such as:
Flowfiles sent/received
MBs read/written
JVM usage/thread counts
Dashboard widgets can:
be drilled into to see results from last 1,2,4 hours, day, week etc
export metrics data to csv or JSON
These same metrics can be viewed in Grafana dashboard:
Grafana can be accessed via quick link under ‘Ambari metrics’ service in Ambari
Pre-configured dashboards are available for each service but users can easily create custom dashboards for each component too
Ease of Security Setup
NiFi Identity mappings
These are used to map identities in DN pattern format (e.g. CN=Tom, OU=NiFi) into common identify strings (e.g. Tom@NiFi)
The patterns can be configured via ‘Advanced NiFi-properties’ section of Ambari configs. Sample values are provided via helptext
ActiveDirectory/LDAP integration
To enable users to login to NiFi using AD/LDAP credentials the ‘Advanced NiFi-login-identity-providers.xml’ section can be used to setup an ldap-provider for NiFi. Commented out sample xml fields are provided for the relevant settings e.g.
AD/LDAP url, search base, search filter, manager credentials
SSL for NiFi
Detailed steps for enabling SSL/identity mappings for Nifi available here
Options for SSL for NiFi:
1. Use NiFi CA to generate self-signed certificates
good for quick start/demos
2. Use your existing certificates
Usually done for production envs
SSL related configs are combined together in ‘Advanced NiFi-ambari-ssl-config’ config panel
Checkbox for whether SSL is enabled
NiFi CA fields - to configure certificate to be generated:
NiFi CA token(required)
NiFi CA DN prefix/suffix
NiFi CA Cert duration
NiFi CA host port
Checkbox for ‘NiFi CA Force Regenerate’
Keystore/truststore related fields - location/type of certificates:
Paths
Passwords
Types
Node identity fields:
Initial Admin Identity: long form of identity of Nifi admin user
Node Identities: long form of identities of nodes running Nifi
SSL Option 1 - using NiFi CA to generate new certificates through Ambari:
Just check “Enable SSL?” box and make sure CA token is set
Optionally update below as needed:
NiFi CA DN prefix/suffix
NiFi CA Cert duration
NiFi CA port
Check ‘NiFi CA Force Regenerate’ box
For changing certs after SSL already enabled
You can force regeneration of the certificates by either:
checking “NiFi CA Force Regenerate” checkbox
Or changing the passwords
You can also manually use tls-toolkit in standalone mode to generate new certificates outside of Ambari
SSL Option 2 - using your existing certificates:
Manually copy certificates to nodes
Populate keystore/truststore path/password/type fields
For keystore/trust paths that contain FQDN that need resolving:
use {NiFi_node_ssl_host} (This is useful for certs generated by NiFi-toolkit as they have the host’s FQDN in their name/path)
In both cases while enabling SSL, you will also need to populate the identity fields. This is to be able to login to NiFi after enabling SSL (assuming Ranger authorizer will not be used)
When setting these, first make sure that on all the nodes, authorizations.xml do not contain any policies. If it does, delete authorizations.xml from all nodes running NiFi. Otherwise, the identity related changes would not take effect.
On initial install there will not be any policies, but they will get created the first time the Identity fields are updated and NiFi restarted (i.e. if you entered incorrect values the first time, you will need to delete policies before re-entering the values)
Then save config changes and restart NiFi from Ambari to enable SSL
If NiFi CA option was used, this is the point at which certificates will get generated
Ranger integration with NiFi
Before installing Ranger there are some manual prerequisite steps:
Setup RDBMs to store Ranger policies
Install/setup Solr to store audits. In test/development environments, Ranger can re-use the Solr that comes with Logsearch/Ambari Infra services
Detailed steps for integrating Nifi with Ranger here
During Ranger install…
The backend RDBMS details are provided first via ‘Ranger Admin’ tab
The NiFi Ranger plugin can be enabled manage NiFi authorization policies in Ranger via ‘Ranger Plugin’ tab
Users/Groups can be synced from Active Directory/LDAP via ‘Ranger User Info’ tab
Ranger audits can be configured via ‘Ranger audit’ tab
After enabling Ranger and restarting NiFi, new config tabs appear under NiFi configs. NiFi/Ranger related configs can be accessed/updated here:
Ranger can be configured to communicate and retrieve resources from NiFi using a keystore (that has been imported into NiFi’s truststore)
Using a NiFi REST Client, Ranger is able to retrieve NiFi’s API endpoint information that can be secured via authorization
This list of resources are made available as auto-complete options when users are attempting to configure policies in Ranger
To communicate with NiFi over SSL a keystore and truststore should be available (with Ranger’s certificate imported into NiFi node truststores) for identification. The Owner for Certificate should be populated as well.
Once Ranger is identified NiFi will authorize Ranger to perform its resource lookup
Ranger policies can be created for NiFi (either via Ranger UI or API)
Create users in Ranger for NiFi users (either from certificate DNs, or import using AD/LDAP sync)
Decide which user has what type of access on what identifier
Default policy automatically created on first setup
Policy updates will be picked by Nifi after 30 seconds (by default)
Recommended approach:
Grant user access to modify the NiFi flow with a policy for /process-groups/<root-group-id> with RW
separate a policy for /provenance/process-groups/<root-group-id> (with each of the cluster node DNs) for read access
Ranger now track audits for NiFi (stored in standalone Solr or logsearch Solr)
For example: What user attempted what kind of NiFi access from what IP at what time?
Ranger also audits user actions related to NiFi in Ranger
For example: Which user created/updated NiFi policy at what time?
Kerberos for NiFi
HDF cluster with NiFi can be kerberized via standard Ambari security wizard (via MIT KDC or AD)
Also supported: NiFi installation on already kerberized HDF cluster
Detailed steps for enabling kerberos for HDF available here
Wizard will allow configuration of principal name and keytab path
NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari
During security wizard, NiFi.properties will automatically be updated:
NiFi.kerberos.service.principal
NiFi.kerberos.keytab.location
NiFi.kerberos.krb5.file
NiFi.kerberos.authentication.expiration
After enabling kerberos, login provider will also be switched to kerberos under the covers
Allows users to login via KDC credentials instead of importing certificates into the browser
Writing audits to kerberized Solr supported
After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos)
Try it out yourself!
Installation using official documentation: link
Automation to deploy clusters using Ambari blueprints: link
Enable SSL/Identity mappings for Nifi via Ambari: link
Enable Ranger authorization for Nifi: link
Enable Kerberos for HDF via Ambari: link
... View more