Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1717 | 01-14-2017 01:52 AM | |
3041 | 12-07-2016 06:41 PM | |
3424 | 11-02-2016 06:56 PM | |
1251 | 10-19-2016 08:10 PM | |
3248 | 10-19-2016 08:05 AM |
11-17-2021
01:50 PM
1 Kudo
This article will show you how to interact with Atlas APIs in CDP-public to create tags and associate tags with entities (in preparation for use with Ranger's tag based policies)
In Cloudera CDP-public offering, Apache Atlas is a part of SDX DataLake cluster that is created when you create your first Environment:
Introduction to Data Lakes
Pre-requisites
A. First, you will need to find the Atlas endpoint using the Cloudera CDP management console:
Accessing Data Lake services
Sample Atlas endpoint: https://pse-722-cdp-xxxxx.cloudera.site/pse-722-cdp-dl/cdp-proxy-api/atlas/api/atlas/
B. Next, you will need to set your user's workload password
Setting the workload password
Now you can use the following sample bash code to interact with Atlas APIs from a CentOS instance outside CDP:
From Atlas endpoint, you can extract the first 2 params below. You will also need to set your username and password:
export datalake_name='pse-722-cdp-dl'
export lake_ip='pse-722-cdp-xxxxx.cloudera.site'
export user='abajwa'
export password='nicepassword'
export atlas_curl="curl -k -u ${user}:${password}"
export atlas_url="https://${lake_ip}:443/${datalake_name}/cdp-proxy-api/atlas/api/atlas"
After forming the above variables, you can use them to run some basic GET and POST commands to import tags and glossary into Atlas.
#test API by fetching Atlas typedefs
${atlas_curl} ${atlas_url}/v2/types/typedefs
#download sample Glossary
wget https://github.com/abajwa-hw/masterclass/blob/master/ranger-atlas/HortoniaMunichSetup/data/export-glossary.zip
#import sample Glossary into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F data=@export-glossary.zip ${atlas_url}/import
#import sample tags
wget https://github.com/abajwa-hw/masterclass/raw/master/ranger-atlas/HortoniaMunichSetup/data/classifications.json
#import sample tags into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: application/json" ${atlas_url}/v2/types/typedefs -d @classifications.json
At this point, you should be able to see the newly imported tags and glossary entities in your Atlas UI.
Next, you can search for any Hive entity (this should get automatically created in Atlas when the Hive table is created) and associate it with a tag.
#find airlines_new_orc.airports entity in Atlas
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm
#fetch guid for airlines_new_orc.airports
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | jq '.entity.guid' | tr -d '"')
#use guid to associate a tag REFERENCE_DATA to airlines_new_orc.airports entity
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}'
#confirm now entity shows REFERENCE_DATA tag (also will be visible via UI)
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | grep REFERENCE_DATA
Now that you have entities tagged with a tag, you can use Ranger to create a "tag-based policy".
Tag-based Services and Policies
Other sample code to associate tags Atlas: How to automate associating tags/classifications to HDFS/Hive/HBase/Kafka entities using REST APIs
... View more
02-23-2021
07:53 AM
@Rajesh2622 you can find the CSV files containing the data under https://github.com/abajwa-hw/masterclass/tree/master/ranger-atlas/HortoniaMunichSetup/data (e.g. ww_customers_data.csv)
... View more
02-04-2021
09:13 PM
@antonio_r Glad to hear it! Thanks for providing the details to help the next person. Have updated the article with the link to the networking prereqs. Enjoy your new cluster!
... View more
02-03-2021
01:27 PM
@antonio_r after running the script, I noticed some services show weird state (even though they are up). You can restart "Cloudera Management Service" (scroll down to the bottom of the list of services, under Zookeeper)...that usually fixes it for me Vanilla VM should work the same way...I have installed using the script on our internal Openstack env w/o issues. You should not require both internal/public IP. Just make sure the networking is setup as required by Hadoop: https://docs.cloudera.com/cloudera-manager/7.1.1/installation/topics/cdpdc-configure-network-names.html
... View more
02-03-2021
01:06 PM
@boulder the AMI comes with trial license of CM which expires after 90 days. At that point, the services are all up but in order to open CM you would need to add a license. Note: We just updated the article with an updated 7.1.4 AMI which has a fresh trial. You can also use the script option to spin up a fresh cluster which gives a new trial license each time @antonio_r thanks yes you'd need a VM with roughly the same specs as an m4.4xlarge would. Have updated the article with to include specs for option #1 as well
... View more
08-19-2020
06:29 PM
@AkhilTech thanks for your question. We just updated the article to include a link to a new AMI based on CDP 7.1.3. Alternatively, you can also use the script to deploy instead, which will give you a new trial license each time. To request a permanent license you can contact our sales team: https://www.cloudera.com/contact-sales.html
... View more
06-03-2020
07:40 PM
9 Kudos
Security/Governance/GDPR Demo on CDP-Private Cloud Base 7.x
Summary
This article explains how to quickly set up Cloudera Security/Governance/GDPR (Worldwide Bank) demo using Cloudera Data Platform - Private Cloud Base (formerly known as CDP-Data Center). It can be deployed either on AWS using AMI or on your own setup via provided script
What's included
Single node CDP 7.1.7 including:
Cloudera Manager (60-day trial license included) - for managing the services
Kerberos - for authentication (via local MIT KDC)
Ranger - for authorization (via both resource/tag-based policies for access and masking)
Atlas - for governance (classification/lineage/search)
Zeppelin - for running/visualizing Hive queries
Impala/Hive 3 - for Sql access and ACID capabilities
Spark/HiveWarehouseConnector - for running secure SparkSQL queries
Worldwide Bank artifacts
Demo hive tables
Demo tags/attributes and lineage in Atlas
Demo Zeppelin notebooks to walk through a demo scenario
Ranger policies across HDFS, Hive/Impala, Hbase, Kafka, SparkSQL to showcase:
Tag-based policies across HDP components
Row-level filtering in Hive columns
Dynamic tag-based masking in Hive/Impala columns
Hive UDF execution authorization
Atlas capabilities like
Classifications (tags) and attributes
Tag propagation
Data lineage
Business glossary: categories and terms
GDPR Scenarios around consent and data erasure via Hive ACID
Hive ACID / MERGE labs
Option 1: Steps to deploy on your own setup
Launch a vanilla Centos 7 VM (8 cores / 64GB RAM / 100GB storage) and perform the documented network prereqs. Then set up a single node CDP cluster using this GitHub (i nstead of "base" CM template choose the "wwbank_krb.json" template) as follows: yum install -y git
#setup KDC
curl -sSL https://gist.github.com/abajwa-hw/bca3d23fe146c3ebd59a9b5fd19480a3/raw | sudo -E sh
git clone https://github.com/fabiog1901/SingleNodeCDPCluster.git
cd SingleNodeCDPCluster
./setup_krb.sh gcp templates/wwbank_krb.json
#Setup worldwide bank demo using script
curl -sSL https://raw.githubusercontent.com/abajwa-hw/masterclass/master/ranger-atlas/setup-dc-703.sh | sudo -E bash
Once the script completes, restart Zeppelin once (via CM) for it to pick up the demo notebooks
Option 2: Steps to launch prebuilt AMI on AWS
Login in to the AWS EC2 console using your credentials
Select the AMI from ‘N. California’ region by clicking one of the links below:
CDP 7.1.7 here
CDP 7.1.4 here
CDP 7.1.3 here
CDP 7.1.1 here
CDP 7.0.3 here
Now choose instance type: select m4.4xlarge and click Next: Note: If you choose a smaller instance type from the above recommendation, not all services may come up.
In Configure Instance Details, ensure Auto-assign Public IP is enabled and click Next:
In Add storage, use at least 100 GB and click Next:
In Add Tags, add tags needed to prevent instances from being terminated. Click Next:
In Configure Security Group, create a new security group and select All traffic and open all ports to only your IP . The below image displays my IP address:
In Review Instance Launch, review your settings and click Launch:
Create and download a new key pair (or choose an existing one). Then click Launch instances:
Click the shown link under Your instances are now launching:
This opens the EC2 dashboard that shows the details of your launched instance:
Make note of your instance’s Public IP (which will be used to access your cluster). If the Public IP is blank, wait for a couple of minutes for this to be populated.
After five to ten minutes, open the below URL in your browser to access Cloudera Manager (CM) console: http://<PUBLIC IP>:7180.
Login as admin/admin:
At this point, CM may still be in the process of starting all the services. You can tell by the presence of the blue operation notification near the bottom left of the page. If so, just wait until it is done: (Optional) You can also monitor the startup using the log as below:
Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP>
Tail the startup log: tail -f /var/log/cdp_startup.log
Once you see “cluster is ready!”, you can proceed.
Once the blue operation notification disappears and all the services show a green checkmark, the cluster is fully up.
If any services fail to start, use the hamburger icon next to SingleNodeCluster > Start button to start.
Accessing cluster resources
CDP urls
Access CM at :7180 as admin/admin
Access Ranger at :6080. Ranger login is admin/BadPass#1
Access Atlas at :31000. Atlas login is admin/BadPass#1
Access Zeppelin at : 8885. Zeppelin user s logins are:
joe_analyst = BadPass#1
ivanna_eu_hr = BadPass#1
etl_user = BadPass#1
Demo walkthrough
Run queries as joe_analyst
Open Zeppelin and login as joe_analyst. Find his notebook by searching for "worldwide" using the text field under the Notebook section. Select the notebook called Worldwide Bank - Joe Analyst:
On the first launch of the notebook, you will be prompted to choose interpreters. You can keep the defaults. Ensure to click the Save button:
Run through the notebook. This notebook shows the following:
MRN/password masked via tag policy. The following shows the Ranger policy that enables this:
In the Dynamic Column Level Masking, address, nationalID, credit card numbers are masked using Hive column policies specified in Ranger. Notice that birthday and age columns are masked using the custom mask:
It also shows a prohibition policy where zipcode, insuranceID, and blood type cannot be combined in a query:
It shows tag-based policies.
Attempts to access an object tagged with EXPIRES_ON accessed after the expiry date will be denied. As we will show later, the fed_tax column of tax_2015 table is tagged in Atlas as EXPIRED_ON with an expiry date of 2016. Hence, it should not be allowed to be queried:
Also attempts to access objects tagged with PII will be denied as per policy. Only HR is allowed. As we will show later, the SSN column of tax_2015 table is tagged as PII in Atlas:
Attempts to access cost_savings.claim_savings table as an analyst will fail because there is a policy that a minimum of 60% data quality score is required for analysts. As we will see, this table is tagged in Atlas as having a score of 51%:
The same queries can also be run via SparkSQL via spark-shell (as described above). The following is the sample query for joe_analyst: hive.execute("SELECT surname, streetaddress, country, age, password, nationalid, ccnumber, mrn, birthday FROM worldwidebank.us_customers").show(10)
hive.execute("select zipcode, insuranceid, bloodtype from worldwidebank.ww_customers").show(10)
hive.execute("select * from cost_savings.claim_savings").show(10) Confirm using Ranger audits that the queries ran as joe_analyst . Also, notice that column names, masking types, IPs, and policy IDs were captured. Also notice tags (such as DATA_QUALITY or PII) are captured along with their attributes. Also, notice that these audits were captured for operations across Hive, Hbase, Kafka, and HDFS:
Run queries as ivanna_eu_hr
Once services are up , open Ranger UI and also log in to Zeppelin as ivanna_eu_hr.
F ind her notebook by searching for hortonia using the text field under the Notebook section.
Select the notebook called Worldwide Bank - Ivana EU HR:
On the first launch of the notebook, you may be prompted to choose interpreters. You can keep the defaults, ensure you click Save button:
Run through the notebook cells using Play button at the top right of each cell (or Shift-Enter):
This notebook highlights the following:
Row-level filtering: As Ivana can only see data for European customers who have given consent (even though she is querying ww_customers table which contains both US and EU customers). The following is the Ranger Hive policy that enables this feature:
It also shows that since Ivana is part of the HR group, there are no policies that limit her access. Hence, so she can see raw passwords, nationalIDs, credit card numbers, MRN #, birthdays, etc.
The last cells show that tag-based policies.
Once you successfully run the notebook, you can open the Ranger Audits to show the policies and that the queries ran as her and that row filtering occurred (notice ROW_FILTER access type):
Run queries as etl_user
Similarly, you can log in to Zeppelin as etl_user and run his notebook as well
This notebook shows how an admin would handle GDPR scenarios like the following using Hive ACID capabilities:
When a customer withdraws consent (so they no longer appear in searches)
When a customer requests their data to be erased
Run Hive/Impala queries from Hue
Alternatively, you can log in to Hue as joe_analyst and select Query > Editor > Hive, and click Saved queries to run Joe's sample queries via Hive:
You can also switch the editor to Impala to run Joe's sample queries via Impala to show tag-based access policy working for Impala:
In CDP 7.1.1, Impala also supports column-based masking:
Alternatively, you can log in to Hue as ivanna_eu_hr and click Saved queries to run Ivanna's sample queries via Hive:
Run SparkSQL queries via Hive Warehouse Connector (HWC)
To run secure SparkSQL queries (using Hive Warehouse Connector):
Connect to instance via SSH using your keypair
Authenticate as the user you want to run queries as via keytabs: kinit -kt /etc/security/keytabs/joe_analyst.keytab joe_analyst/$(hostname -f)@CLOUDERA.COM
Start SparkSql using HiveWarehouseConnector: spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly*.jar --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://$(hostname -f):10000/default;" --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/$(hostname -f)@CLOUDERA.COM" --conf spark.security.credentials.hiveserver2.enabled=false
Import HWC classes and start the session : import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()
Run queries using hive.execute(). Example: hive.execute("select * from cost_savings.claim_savings").show(10)
The following is a sample script to automate above for joe_analyst here: /tmp/masterclass/ranger-atlas/HortoniaMunichSetup/run_spark_sql.sh
Troubleshooting Zeppelin
In case you encounter Thrift Exception like the following, it's likely the session was expired:
Just scroll to the top and click the Gears icon (near top right) to display the interpreters and restart the JDBC one:
Atlas walkthrough
Log in to Atlas and show the Hive columns tagged as EXPIRES_ON:
To see the table name, you can select Table in the Column dropdown:
Now, notice the table name is also displayed:
Select the fed_tax column and open the Classifications tab to view the attributes of the tag (expiry_date) and value:
To save this search, click the Save As button near the bottom left. Provide a Name and click Create to save:
Similarly, you can query for Hive tables tagged with DATA_QUALITY:
Click on claim_savings to see that the quality score associated with this table is less than 60%:
Click back, and select the claims_view table instead.
Click the Lineage tab. This shows that this table was derived from the claims_saving table:
Click on the Classifications tab and notice that because the table claims_view table was derived from (claims_savings) and had a DATA_QUALITY tag, the tag was automatically propagated to claims_view table itself (i.e. no one had to manually tag it):
Use Atlas to query for hive_tables and pick provider_summary to show lineage and impact:
You can use the Audits tab to see audits on this table:
You can use the Schema tab to inspect the table schema:
Navigate to the Classification tab to see how you can easily see all entities tagged with a certain classification (across Hive, Hbase, Kafka, HDFS etc):
Navigate to the Glossary tab to see how you can define Glossary categories and terms, as well as search for any entities associated with those terms:
Navigate to Atlas home page and notice the option to create a new entity:
The following is a sample out of the box entity types that you can create:
Selecting an entity type (e.g. hdfs_path) displays the required and optional fields that you need to manually create the new entity:
Hive ACID/Merge walkthrough
In Zeppelin, there are two Hive-related notebooks provided to demonstrate Hive ACID and MERGE capabilities. Log in to Zeppelin as etl_user to be able to run these:
The notebooks contain tutorials that walk through some of the theory and concepts before going through some basic examples:
Appendix:
The following are some older AMI links (for HDP releases):
For HDP 3.1.4: click here
For HDP 3.1 with Knox SSO: click here
For HDP 2.6.5 with Knox SSO: click here
For HDP 2.6.5: click here
... View more
10-30-2018
12:35 AM
CLOUD.HORTONWORKS.COM was just an example...you can change this to whatever you like. If you are using AD, you would probably want to set it to your AD domain
... View more
10-30-2018
12:33 AM
You can easily install it via Ambari > Hosts > choose which host you want to install on > Add > "NiFi Certificate Authority"
... View more
09-08-2018
05:52 PM
6 Kudos
Summary: The release of HDF 3.3 brings about a significant number of improvements in HDF. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy either HDF only clusters or combined HDP/HDF clusters in 5 easy steps. To quickly setup a single node setup, prebuilt AMIs are available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands. Steps for each of the below option are described in this article: A. Single-node prebuilt AMIs on AWS B. Single-node fresh installs C. Multi-node fresh installs A. Single-node prebuilt AMI on AWS: Steps to launch the AMI 1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab. 2. Select the AMI from ‘N. California’ region by clicking one of the below options To spin up HDP 3.1/HDF 3.3, click here To spin up HDF 3.3 only cluster, click here Now choose instance type: select ‘m4.2xlarge’ and click Next Note: if you choose a smaller instance type from the above recommendation, not all services may come up 3. Configure Instance Details: leave the defaults and click ‘Next’ 4. Add storage: keep at least the default of 800 GB and click ‘Next’ 5. Optionally, add a name or any other tags you like. Then click ‘Next’ 6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’ 7. Review your settings and click Launch 8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’ 9. Click the shown link under ‘Your instances are now launching’ 10. This opens the EC2 dashboard that shows the details of your launched instance 11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster). If it is blank, wait 1-2 minutes for this to be populated. 12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as user:admin and pass:StrongPassword (see previous step) 13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done. (Optional) You can also monitor the startup using the log as below: Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> Tail the startup log: tail -f /var/log/hdp_startup.log Once you see “cluster is ready!” you can proceed 14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up. B. Single-node install: Launch a fresh CentOS/RHEL 7 instance with 4+cpu and 16GB+ RAM and run below. Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster) To deploy HDF 3.3 only cluster, run below export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/b5565d7e7f9beffd8dd57a970dc54266/raw | sudo -E sh To deploy HDF 3.3/HDP3.1 combined cluster, run below export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/d7cd1c0232c1af46ee2c465e4871ddc6/raw | sudo -E sh Once launched, the script will install Ambari and use it to deploy HDF cluster Note: this script can also be used to install multi-node clusters after step #1 below is complete (i.e. after the agents on non-AmabriServer nodes are installed and registered). Just change the value of the host_count variable C. Multi-node HDF 3.3 install: 0. Launch your RHEL/CentOS 7 instances where you wish to install HDF. In this example, we will use 4 m4.xlarge instances. Select an instance where ambari-server should run (e.g. node1) 1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run below on all hosts where Ambari-server will not be running (e.g. node2-4). This will run pre-requisite steps, install Ambari-agents and point them to Ambari-server host: export ambari_server=<FQDN of host where ambari-server will be installed>;#replace this
export install_ambari_server=false
export ambari_version=2.7.3.0
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; 2. Run remaining steps on host where Ambari-server is to be installed (e.g. node1). The below commands run pre-reqs and install Ambari-server export db_password="StrongPassword" # MySQL password
export nifi_password="StrongPassword" # NiFi password must be at least ten chars
export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/amazonlinux2/3.x/updates/3.3.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.3.0.0-165.tar.gz"
export ambari_version=2.7.3.0
#install bootstrap
yum install -y git python-argparse
cd /tmp
git clone https://github.com/seanorama/ambari-bootstrap.git
#Runs pre-reqs and install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; 3. On the same node, install MySQL and create databases and users for Schema Registry and SAM sudo yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
sudo yum install -y epel-release mysql-connector-java* mysql-community-server # MySQL Setup
sudo systemctl enable mysqld.service
sudo systemctl start mysqld.service
#extract system generated Mysql password
oldpass=$( grep 'temporary.*root@localhost' /var/log/mysqld.log | tail -n 1| sed 's/.*root@localhost: //')
#create sql file that
# 1. reset Mysql password to temp value and create druid/superset/registry/streamline schemas and users
# 2. sets passwords for druid/superset/registry/streamline users to ${db_password}
cat << EOF > mysql-setup.sql
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Secur1ty!';uninstall plugin validate_password;CREATE DATABASE registry DEFAULT CHARACTER SET utf8; CREATE DATABASE streamline DEFAULT CHARACTER SET utf8;CREATE USER 'registry'@'%' IDENTIFIED BY '${db_password}'; CREATE USER 'streamline'@'%' IDENTIFIED BY '${db_password}';GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ; GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;commit;
EOF
#execute sqlfile
mysql -h localhost -u root -p"$oldpass" --connect-expired-password < mysql-setup.sql
#change Mysql password to StrongPassword
mysqladmin -u root -p'Secur1ty!' password StrongPassword
#test password and confirm dbs created
mysql -u root -pStrongPassword -e 'show databases;' 4. On the same node, install Mysql connector jar and then HDF mpack. Then restart Ambari so it recognizes HDF stack: sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
sudo ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose
sudo ambari-server restart At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in the doc. Otherwise, to proceed with deploying via blueprints follow the remaining steps. 4. On the same node, provide minimum configurations required for install by creating configuration-custom.json. You can add to this to customize any component's property that is exposed by Ambari cd /tmp/ambari-bootstrap/deploy
cat << EOF > configuration-custom.json
{
"configurations": {
"ams-grafana-env": {
"metrics_grafana_password": "${ambari_password}"
},
"kafka-broker": {
"offsets.topic.replication.factor": "1"
},
"streamline-common": {
"jar.storage.type": "local",
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/streamline",
"registry.url" : "http://localhost:7788/api/v1",
"streamline.dashboard.url" : "http://localhost:9089",
"streamline.storage.connector.password": "${db_password}"
},
"registry-common": {
"jar.storage.type": "local",
"registry.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/registry",
"registry.storage.type": "mysql",
"registry.storage.connector.password": "${db_password}"
},
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "${nifi_password}"
},
"nifi-registry-properties": {
"nifi.registry.db.password": "${nifi_password}"
},
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "${nifi_password}"
}
}
}
EOF 5. Then run below as root to generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server) sudo su
cd /tmp/ambari-bootstrap/deploy/
export host_count=<Number of total nodes>
export ambari_stack_name=HDF
export ambari_stack_version=3.3
export cluster_name="HDF"
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS KNOX"
./deploy-recommended-cluster.bash You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed! Notes: a) This will only install Nifi on a single node of the cluster by default b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json: "nifi-ambari-ssl-config":{
"nifi.toolkit.tls.token":"hadoop",
"nifi.node.ssl.isenabled":"true",
"nifi.security.needClientAuth":"true",
"nifi.toolkit.dn.suffix":", OU=HORTONWORKS",
"nifi.initial.admin.identity":"CN=nifiadmin, OU=HORTONWORKS",
"content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>"
}, Make sure to replace node-x.fqdn with the FQDN of each node running Nifi c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the Twitter demo into an env var twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw) Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint: "nifi-flow-env":{
"properties_attributes":{},
"properties":{"content":"${twitter_flow}"}
} d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well: export deploy=false .... The blueprint will be created under /tmp/ambari-bootstrap*/deploy/tempdir*/blueprint.json Sample blueprints Sample generated blueprint for 4 node HDF 3.3 only cluster is provided for reference here: {
"Blueprints": {
"stack_name": "HDF",
"stack_version": "3.3"
},
"host_groups": [
{
"name": "host-group-1",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIFI_CA"
},
{
"name": "STREAMLINE_SERVER"
}
]
},
{
"name": "host-group-4",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
}
]
},
{
"name": "host-group-2",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "DRPC_SERVER"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "KAFKA_BROKER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIMBUS"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "KNOX_GATEWAY"
},
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "REGISTRY_SERVER"
},
{
"name": "STORM_UI_SERVER"
}
]
},
{
"name": "host-group-3",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
}
]
}
],
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"ams-hbase-env": {
"hbase_master_heapsize": "512",
"hbase_regionserver_heapsize": "768",
"hbase_master_xmn_size": "192"
}
},
{
"nifi-logsearch-conf": {}
},
{
"storm-site": {
"metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter",
"topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\.population$\", \"__sendqueue\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\.usedBytes$\", \"memory/nonHeap\\.usedBytes$\", \"GC/.+\\.count$\", \"GC/.+\\.timeMs$\"]}]",
"storm.local.dir": "/hadoop/storm",
"storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]"
}
},
{
"registry-common": {
"registry.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/registry",
"registry.storage.type": "mysql",
"jar.storage.type": "local",
"registry.storage.connector.password": "StrongPassword"
}
},
{
"registry-env": {}
},
{
"registry-logsearch-conf": {}
},
{
"streamline-common": {
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/streamline",
"streamline.dashboard.url": "http://localhost:9089",
"registry.url": "http://localhost:7788/api/v1",
"jar.storage.type": "local",
"streamline.storage.connector.password": "StrongPassword"
}
},
{
"nifi-registry-properties": {
"nifi.registry.db.password": "StrongPassword"
}
},
{
"ams-hbase-site": {
"hbase.regionserver.global.memstore.upperLimit": "0.35",
"hbase.regionserver.global.memstore.lowerLimit": "0.3",
"hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
"hbase.hregion.memstore.flush.size": "134217728",
"hfile.block.cache.size": "0.3",
"hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
"hbase.cluster.distributed": "false",
"phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
"hbase.zookeeper.property.clientPort": "61181"
}
},
{
"storm-env": {}
},
{
"streamline-env": {}
},
{
"ams-site": {
"timeline.metrics.service.webapp.address": "localhost:6188",
"timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.downsampler.event.metric.patterns": "topology\.%",
"timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.service.handler.thread.count": "20",
"timeline.metrics.service.watcher.disabled": "false",
"timeline.metrics.host.aggregator.ttl": "86400"
}
},
{
"kafka-broker": {
"log.dirs": "/kafka-logs",
"offsets.topic.replication.factor": "1"
}
},
{
"ams-grafana-env": {
"metrics_grafana_password": "StrongPassword"
}
},
{
"streamline-logsearch-conf": {}
},
{
"zoo.cfg": {
"dataDir": "/hadoop/zookeeper"
}
},
{
"ams-env": {
"metrics_collector_heapsize": "512"
}
}
]
}<br> Sample cluster.json for this 4 node cluster: {
"blueprint": "recommended",
"default_password": "hadoop",
"host_groups": [
{
"hosts": [
{
"fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal"
}
],
"name": "host-group-1"
},
{
"hosts": [
{
"fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal"
}
],
"name": "host-group-3"
},
{
"hosts": [
{
"fqdn": "ip-xxx-xxx-xxx-xxx.us-west-1.compute.internal"
}
],
"name": "host-group-4"
},
{
"hosts": [
{
"fqdn": "ip-xx-xx-xx-xxx.us-west-1.compute.internal"
}
],
"name": "host-group-2"
}
]
}
... View more
- Find more articles tagged with:
- ambari-blueprint
- automation
- Data Ingestion & Streaming
- FAQ
- hdf
- NiFi
05-05-2018
12:03 AM
3 Kudos
Summary: While automating setup of Hortoniabank demo, we needed to automate the task of associating Atlas tags to HDP entities like HDFS, Hive, HBase, Kafka using the names of entities (rather than their guids in Atlas). One option is to use Atlas APIs to find the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. For components like Hive that already have Atlas hook, the Atlas entities for Hive tables will automatically be created when the table is created. For these, we have just provided the API calls to associate the tags with the entity. For others like Kafka, HDFS, Hbase etc that do not have an Atlas hook (as of HDP 2.6.x), you will need to create the entity first. For these, we have provided both the API call to create the entity and the call to associate the tags with the entity. Code samples: The below code examples assume the tags have already been created. these can be created either manually via Atlas UI or using the API. Here is a sample Atlas API call to create a basic tag called TEST that does not have any attributes. ${atlas_curl} ${atlas_url}/types \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"TEST","typeDescription":"TEST","typeVersion":"1.0","attributeDefinitions":[]}],"classTypes":[]}'
All the examples operate the same way: find the guid of the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. First we setup common vars: atlas_host="atlas.domain.com"
cluster_name="datalake"
atlas_curl="curl -u admin:admin"
atlas_url="http://${atlas_host}:21000/api/atlas"
Example 1: Associate tag REFERENCE_DATA (w/o attributes) to Hive table hortoniabank.eu_countries #fetch guid for table hortoniabank.eu_countries@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=hortoniabank.eu_countries@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add REFERENCE_DATA tag
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}' Example 2: Associate tag DATA_QUALITY (with attribute: score and value: 0.51) to Hive table cost_savings.claim_savings #fetch guid for table cost_savings.claim_savings@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=cost_savings.claim_savings@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add DATA_QUALITY tag with score=0.51
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"DATA_QUALITY", "values":{"score": "0.51"}}'
Example 3: Associate tag FINANCE_PII (with attribute: type and value:finance) to Hive column finance.tax_2015.ssn #fetch guid for finance.tax_2015.ssn
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_column?attr:qualifiedName=finance.tax_2015.ssn@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add FINANCE_PII tag with type=finance
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"FINANCE_PII", "values":{"type": "finance"}}' Example 4: Create entity for kafka topic PRIVATE and associate with tag SENSITIVE #create entities for kafka topics PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"kafka_topic", "attributes":{ "description":null, "name":"PRIVATE", "owner":null, "qualifiedName":"PRIVATE@${cluster_name}", "topic":"PRIVATE", "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/kafka_topic?attr:qualifiedName=PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 5: create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag #create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hbase_table", "attributes":{ "description":"T_PRIVATE table", "name":"T_PRIVATE", "owner":"hbase", "qualifiedName":"T_PRIVATE@${cluster_name}", "column_families":[ ], "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hbase_table?attr:qualifiedName=T_PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 6: create entities for HDFS path /banking and associate with BANKING tag #create entities for HDFS path /banking and associate with BANKING tag
hdfs_prefix="hdfs://$(hostname -f):8020"
hdfs_path="/banking"
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hdfs_path", "attributes":{ "description":null, "name":"${hdfs_path}", "owner":null, "qualifiedName":"${hdfs_prefix}${hdfs_path}", "clusterName":"${cluster_name}", "path":"${hdfs_prefix}${hdfs_path}" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hdfs_path?attr:qualifiedName=${hdfs_prefix}${hdfs_path} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"BANKING","values":{}}'
... View more
- Find more articles tagged with:
- api
- Atlas
- automation
- FAQ
- Governance & Lifecycle
- How-ToTutorial
- tag
Labels:
03-12-2018
11:02 PM
Recently ran into same issue where we were getting alerts for all the UIs. The root cause ended up being the times on the nodes were not correct. After fixing the time and restarting ambari server/agents and all services, the alerts went away
... View more
02-17-2018
10:28 AM
8 Kudos
Summary:
The
release of HDF 3.1 brings about a significant number of improvements in HDF: Apache Nifi 1.5, Kafka 1.0, plus the new NiFi registry. In addition, there were improvements to Storm, Streaming Analytics Manager, Schema Registry components.
This article shows how you can use
ambari-bootstrap project to easily generate a blueprint and deploy HDF clusters to both either single node or development/demo environments in 5 easy steps. To quickly setup a single node setup, a prebuilt AMI is available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands.
Steps for each of the below option are described in this article: A. Single-node prebuilt AMI on AWS B. Single-node fresh install C. Multi-node fresh install
A. Single-node prebuilt AMI on AWS: Steps to launch the AMI
1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab.
2. Select the AMI from ‘N. California’ region by clicking here. Now choose instance type: select ‘m4.2xlarge’ and click Next
Note: if you choose a smaller instance type from the above recommendation, not all services may come up
3. Configure Instance Details: leave the defaults and click ‘Next’
4. Add storage: keep at least the default of 100 GB and click ‘Next’
5. Optionally, add a name or any other tags you like. Then click ‘Next’
6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’
7. Review your settings and click Launch
8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’
9. Click the shown link under ‘Your instances are now launching’
10. This opens the EC2 dashboard that shows the details of your launched instance
11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster). If it is blank, wait 1-2 minutes for this to be populated. Also make note of your AWS Owner Id (which will be the initial password to login)
12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as user:admin and pass:your AWS Owner Id (see previous step)
13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done.
(Optional) You can also monitor the startup using the log as below:
Open SSH session into the VM using your key and the public IP e.g. from OSX:
ssh -i ~/.ssh/mykey.pem centos@<publicIP>
Tail the startup log:
tail -f /var/log/hdp_startup.log
Once you see “cluster is ready!” you can proceed
14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up.
Other related AMIs
HDP 2.6.4 vanilla AMI (ami-764d4516): Hortonworks HDP 2.6.4 single node cluster running Hive/Spark/Druid/Superset installed via Ambari. Built Feb 18 2018 using HDP 2.6.4.0-91 / Ambari 2.6.1.3-3. Ambari password is your AWS ownerid HDP 2.6.4 including NiFi and NiFi registry from HDF 3.1 (ami-e1a0a981): HDP 2.6.4 plus NiFi 1.5 and Nifi Registry - Ambari admin password is StrongPassword. Built Feb 17 2018 HDP 2.6 plus HDF 3.0 and IOT trucking demo reference app. Details here Note: Above AMIs are available on US West (N. California) region of AWS
B. Single-node HDF install:
Launch a fresh CentOS/RHEL 7 instance with 4+cpu and 16GB+ RAM and run below.
Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster)
export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/b7c027d9eea9fbd2a2319a21a955df1f/raw | sudo -E sh
Once launched, the script will install Ambari and use it to deploy HDF cluster
Note: this script can also be used to install multi-node clusters after step #1 below is complete i.e. after the agents on non-AmabriServer nodes are installed and registered Other related scripts 1. Automation to setup HDP 2.6.x plus NiFi from HDF 3.1 export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/bbe2bdd1ed6a0f738a90dd4e07480e3b/raw | sudo -E sh
C. Multi-node HDF install: 0. Launch your RHEL/CentOS 7 instances where you wish to install HDF. In this example, we will use 4 m4.xlarge instances. Select an instance where ambari-server should run (e.g. node1)
1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run below on all hosts where Ambari-server
will not be running (e.g. node2-4). This will run pre-requisite steps, install Ambari-agents and point them to Ambari-server host:
export ambari_server=<FQDN of host where ambari-server will be installed>; #replace this
export install_ambari_server=false
export ambari_version=2.6.1.0
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
2.
Run remaining steps on host where Ambari-server is to be installed (e.g. node1). The below commands run pre-reqs and install Ambari-server
export db_password="StrongPassword" # MySQL password
export nifi_password="StrongPassword" # NiFi password - must be at least 10 chars
export cluster_name="HDF" # cluster name
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS" #choose services
export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.1.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.1.0.0-564.tar.gz"
export ambari_version=2.6.1.0
#install bootstrap
yum install -y git python-argparse
cd /tmp
git clone https://github.com/seanorama/ambari-bootstrap.git
#Runs pre-reqs and install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ;
3. On the same node, install MySQL and create databases and users for Schema Registry and SAM
sudo yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
sudo yum install -y epel-release mysql-connector-java* mysql-community-server
# MySQL Setup to keep the new services separate from the originals
echo Database setup...
sudo systemctl enable mysqld.service
sudo systemctl start mysqld.service
#extract system generated Mysql password
oldpass=$( grep 'temporary.*root@localhost' /var/log/mysqld.log | tail -n 1 | sed 's/.*root@localhost: //' )
#create sql file that
# 1. reset Mysql password to temp value and create druid/superset/registry/streamline schemas and users
# 2. sets passwords for druid/superset/registry/streamline users to ${db_password}
cat < mysql-setup.sql
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Secur1ty!';
uninstall plugin validate_password;
CREATE DATABASE registry DEFAULT CHARACTER SET utf8; CREATE DATABASE streamline DEFAULT CHARACTER SET utf8;
CREATE USER 'registry'@'%' IDENTIFIED BY '${db_password}'; CREATE USER 'streamline'@'%' IDENTIFIED BY '${db_password}';
GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ; GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;
commit;
EOF
#execute sql file
mysql -h localhost -u root -p"$oldpass" --connect-expired-password < mysql-setup.sql
#change Mysql password to StrongPassword
mysqladmin -u root -p'Secur1ty!' password StrongPassword
#test password and confirm dbs created
mysql -u root -pStrongPassword -e 'show databases;'
4. On the same node, install Mysql connector jar and then HDF mpack. Then restart Ambari so it recognizes HDF stack:
sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
sudo ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose
sudo ambari-server restart
At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in
the doc. Otherwise, to proceed with deploying via blueprints follow the remaining steps.
4. On the same node, provide minimum configurations required for install by creating configuration-custom.json. You can add to this to customize any component's property that is exposed by Ambari
cd /tmp/ambari-bootstrap/deploy/
tee configuration-custom.json > /dev/null << EOF
{
"configurations": {
"ams-grafana-env": {
"metrics_grafana_password": "${db_password}"
},
"streamline-common": {
"jar.storage.type": "local",
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/streamline",
"registry.url" : "http://localhost:7788/api/v1",
"streamline.dashboard.url" : "http://localhost:9089",
"streamline.storage.connector.password": "${db_password}"
},
"registry-common": {
"jar.storage.type": "local",
"registry.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/registry",
"registry.storage.type": "mysql",
"registry.storage.connector.password": "${db_password}"
},
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "${nifi_password}"
},
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "${nifi_password}"
}
}
}
EOF
5. Then run below as root to generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server)
sudo su
cd /tmp/ambari-bootstrap/deploy/
export host_count=<Number of total nodes>
export ambari_stack_name=HDF
export ambari_stack_version=3.1
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS"
./deploy-recommended-cluster.bash
You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed!
Notes:
a) This will only install Nifi on a single node of the cluster by default
b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json:
"nifi-ambari-ssl-config": {
"nifi.toolkit.tls.token": "hadoop",
"nifi.node.ssl.isenabled": "true",
"nifi.security.needClientAuth": "true",
"nifi.toolkit.dn.suffix": ", OU=HORTONWORKS",
"nifi.initial.admin.identity": "CN=nifiadmin, OU=HORTONWORKS",
"content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>"
},
Make sure to replace node-x.fqdn with the FQDN of each node running Nifi
c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the
Twitter demo into an env var
twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw)
Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint:
"nifi-flow-env" : {
"properties_attributes" : { },
"properties" : {
"content" : "${twitter_flow}"
}
}
d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well:
export deploy=false
.... The blueprint will be created under /tmp/ambari-bootstrap*/deploy/tempdir*/blueprint.json
Sample blueprint
Sample generated blueprint for 4 node cluster is provided for reference here:
{
"Blueprints": {
"stack_name": "HDF",
"stack_version": "3.1"
},
"host_groups": [
{
"name": "host-group-3",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "DRPC_SERVER"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "KAFKA_BROKER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIMBUS"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "REGISTRY_SERVER"
},
{
"name": "STORM_UI_SERVER"
}
]
},
{
"name": "host-group-2",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
},
{
"name": "host-group-1",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIFI_CA"
}
]
},
{
"name": "host-group-4",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "ZOOKEEPER_SERVER"
}
]
}
],
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"ams-hbase-env": {
"hbase_master_heapsize": "512",
"hbase_regionserver_heapsize": "768",
"hbase_master_xmn_size": "192"
}
},
{
"nifi-logsearch-conf": {}
},
{
"storm-site": {
"topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\.population$\", \"__sendqueue\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\.usedBytes$\", \"memory/nonHeap\\.usedBytes$\", \"GC/.+\\.count$\", \"GC/.+\\.timeMs$\"]}]",
"metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter",
"storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]"
}
},
{
"registry-common": {
"registry.storage.connector.connectURI": "jdbc:mysql://ip-172-31-21-233.us-west-1.compute.internal:3306/registry",
"registry.storage.type": "mysql",
"jar.storage.type": "local",
"registry.storage.connector.password": "StrongPassword"
}
},
{
"registry-logsearch-conf": {}
},
{
"streamline-common": {
"streamline.storage.type": "mysql",
"jar.storage.type": "local",
"streamline.storage.connector.connectURI": "jdbc:mysql://ip-172-31-21-233.us-west-1.compute.internal:3306/streamline",
"streamline.dashboard.url": "http://localhost:9089",
"registry.url": "http://localhost:7788/api/v1",
"streamline.storage.connector.password": "StrongPassword"
}
},
{
"ams-hbase-site": {
"hbase.regionserver.global.memstore.upperLimit": "0.35",
"hbase.regionserver.global.memstore.lowerLimit": "0.3",
"hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
"hbase.hregion.memstore.flush.size": "134217728",
"hfile.block.cache.size": "0.3",
"hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
"hbase.cluster.distributed": "false",
"phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
"hbase.zookeeper.property.clientPort": "61181"
}
},
{
"ams-env": {
"metrics_collector_heapsize": "512"
}
},
{
"kafka-log4j": {}
},
{
"ams-site": {
"timeline.metrics.service.webapp.address": "localhost:6188",
"timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.host.aggregator.ttl": "86400",
"timeline.metrics.service.handler.thread.count": "20",
"timeline.metrics.service.watcher.disabled": "false"
}
},
{
"kafka-broker": {
"kafka.metrics.reporters": "org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter"
}
},
{
"ams-grafana-env": {
"metrics_grafana_password": "StrongPassword"
}
},
{
"streamline-logsearch-conf": {}
}
]
}
Sample cluster.json for 4 node cluster:
{
"blueprint": "recommended",
"default_password": "hadoop",
"host_groups": [
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x3.us-west-1.compute.internal"
}
],
"name": "host-group-3"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x2.us-west-1.compute.internal"
}
],
"name": "host-group-2"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x4.us-west-1.compute.internal"
}
],
"name": "host-group-4"
},
{
"hosts": [
{
"fqdn": "ip-172-xx-xx-x1.us-west-1.compute.internal"
}
],
"name": "host-group-1"
}
]
}
What next? Now that your cluster is up, you can explore what Nifi's Ambari integration means: https://community.hortonworks.com/articles/57980/hdf-20-apache-nifi-integration-with-apache-ambarir.html Next, you can enable SSL for Nifi: https://community.hortonworks.com/articles/58009/hdf-20-enable-ssl-for-apache-nifi-from-ambari.html
... View more
- Find more articles tagged with:
- ambari-blueprint
- Data Ingestion & Streaming
- hdf
- hdf-3.1
- help
- How-ToTutorial
- NiFi
- nifi-registry
- sam
11-17-2017
04:10 AM
4 Kudos
Overview Partner demo kit is built and maintained by the Hortonworks Partner Solutions team. The purpose of the demo kit is to enable the partners to: Quickly bring up a HDP environments with pre-built demos Leverage available demos to understand the capabilities of the platform Use the demos as part of business conversation to demonstrate the art of possible The remainder of this article provides a short description of the 3 demos packaged within the demo kit and step by step instruction on: How to launch the demo kit on AWS or on private cloud How to execute the demos provided with the demo kit Other Versions The Security/Governance Demo kit for HDP 2.6 can be found here The previous version of demo kit (for HDP 2.5) can be found here Pre-requisites When using AWS, you must already have created your Amazon Web Services account. Sample steps for doing this can be found here. If you have an AWS promo code, you can apply it to your account using the steps here. For running the sentiment demo, you must have created a Twitter application using your Twitter account and generated consumer keys/secrets. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Notes Note that the partner demo kit is not a formally supported offering. In case of questions, see ‘Questions?” section at the end of this article. Slides Slides for demo kit are available here Packaged Demos The demo kit comes with 3 demos: 1. IOT demo Purpose: IOT demo showcases how a logistic company uses the Hortonworks Connected Data Platform to monitor its fleet in real time to mitigate driving infractions Use case setup: Sensor devices from trucks capture events of the trucks and actions of the drivers. Some of these driver events are dangerous "events” such as: Lane Departure, Unsafe following distance, Unsafe tail distance The Business Requirement is to stream these events in, filter on violations and do real-time alerting when “lots” of erratic behavior is detected for a given driver over a short period of time. Over time, users would like to do advanced analytics on the full archive of historical events generated by the trucks to: Determine what factors have an impact on driving violations (e.g. weather, driver fatigue etc) Build an AI model to make predictions when violations will occur Technologies used: Apache Nifi, Kafka, Storm, Streaming Analytics Manager, Schema Registry, HBase, Spark, Zeppelin More details available here and here 2. Sentiment demo Purpose: Sentiment demo showcases how a retail company can use the Hortonworks Connected Data Platform to visualize and analyze social media data related to their products Use case setup: The Business Requirement is to capture, process and analyze flow of tweets to understand the social sentiments for their products Technologies used: Apache Nifi, Solr, HDFS More details available here and here 3. Advanced analytics demo Purpose: Advanced analytics demo showcases how an insurance company can use the Hortonworks Connected Data Platform to visualize and make predictions on earthquake data using Apache Spark’s machine learning libraries Use case setup: The Business Requirement is to be able to perform advanced analytics on world wide earthquake data to predict where large earthquakes will happen so the business can accordingly modify insurance premiums Technologies used: Apache Spark, Zeppelin More details here Option #1: Installing the Demo Kit on your own setup You can install Demo Kit on other public or private clouds using the provided automated script. With this option you would launch a CentOS/RHEL 7 VM of the right size on any cloud of your choice (as long as it has access to public internet), and use provided script to install single node HDP and install the demo. For more details see README here. Setup ETA is 1 hour Option #2: Launching the Demo Kit AMI on AWS You can use this option to launch a prebuilt image of single node HDP (including the demo) on AWS cloud. Setup ETA is 15min Steps to launch the AMI 1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab. 2. Select the AMI from ‘N. California’ region by clicking here. Now choose instance type: select ‘m4.2xlarge’ and click Next Note: if you choose a smaller instance type from the above recommendation, not all services may come up 3. Configure Instance Details: leave the defaults and click ‘Next’ 4. Add storage: keep the default of 500 GB and click ‘Next’ 5. Optionally, add a name or any other tags you like. Then click ‘Next’ 6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For long running instances (i.e. anything beyond an hour), a more restrictive security group policy is strongly encouraged (for example: only allow traffic from your company’s IP range). Then click ‘Review and Launch’ 7. Review your settings and click Launch 8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’ 9. Click the shown link under ‘Your instances are now launching’ 10. This opens the EC2 dashboard that shows the details of your launched instance 11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster) . If it is blank, wait 1-2 minutes for this to be populated 12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as admin user using StrongPassword as password 13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done. (Optional) You can also monitor the startup using the log as below: Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> Tail the startup log: tail -f /var/log/hdp_startup.log Once you see “cluster is ready!” you can proceed 14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up. If any services fail to start, use the Actions > Start All button to start 15. At this point you can follow the demo instructions. Troubleshooting If any service does not come up for some reason, you can use Ambari to retry by clicking: ‘Actions’ > ‘Start all’. In case of multiple failures when starting services, use the EC2 dashboard to double check that the correct instance type was used. Insufficient resources can cause services to not start up successfully It is not required to connect via SSH to your instance. But you can do this using the key pair you created/selected earlier by following the standard instructions on AWS website. Make sure the user you login as is centos A log file of the automated startup of HDP services is available under: /var/log/hdp_startup.log Stopping/Terminating demo kit Once you are done with demo kit, we recommend bringing it down to avoid incurring any unnecessary charges. To do this, follow below: First, stop the cluster services using Ambari by clicking: ‘Actions’ > ‘Stop all’. Then pick from one of the two options: a) Terminate the instance: If you do not want to incur any further charges from AWS, terminate the VM instance from the same ‘EC2 dashboard’ that displayed the instance details. Note that this will destroy the VM, so the next time you wish to use demo kit, you will need to follow the same steps outlined in above section ‘Launching the Demo Kit’ b) Stop the instance: if you want to bring down your VM instance but keep it around so you can start it back up in the future, stop the VM instance from the EC2 dashboard. Note that this option will preserve any customizations you make to the VM but you will incur AWS charges by choosing for this option. More details on stop vs terminate operations can be found on AWS website here and here Demo Execution Steps IOT Demo Video recording of the IOT demo Recording of demo provided here (high level) and here (deeper level) PPT and PDF versions of the slides also available IOT Demo setup instructions Sequence to walk through the IOT trucking demo: Events simulator Schema Registry UI NiFi flow SAM Application view Storm Monitoring view Superset Dashboard Superset Slice creation Zeppelin notebook Detailed steps for IOT trucking demo walk through (Optional): Check that events are being simulated. This step is optional because we will also check this from NiFi UI Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> sudo su - To check events being simulated you can either verify the simulator process is running or monitor the simulator log: ps -ef | grep stream-simulator tail -f /tmp/whoville/data_simulator/simulator.log If simulator is not running, you can invoke it by running below from SSH sessioncd /tmp/whoville/data_simulator/sudo ./runDataLoader.sh In case you need to kill the simulator use the ps command above to find the process id and then kill it Next, we will open the web UIs of a number of components that are part of the demo using the Ambari Quicklinks. For example, for Schema Registry here is how to access the Quicklink: Open Schema Registry using Quicklink in Ambari and check 4 schemas below are listed Open NiFi using Quicklink in Ambari, check that “IOT trucking demo” process group is started Double click on the “IOT trucking demo” box to see the details of the flow. The counters should show that simulated events are flowing through the NiFi flow. You can refresh the UI to see this: Open Storm Monitoring view (under Ambari views), and check the topology is live Open SAM using Quicklink in Ambari, check the application is deployed Double click on the application to see more details. You should see that the Emitted and Transferred fields are non-zero (assuming the simulator has been been running for a few min) Open Druid Console using Quicklink in Ambari, check the two datasets are present Open Druid Superset using Quicklink in Ambari and login using admin/StrongPassword There should be one entry under Dashboards. Click it to open the prebuilt dashboard. The prebuilt dashboard will open. You can periodically click the refresh button to see new data arriving. Datasets can take 2-6 mins for new events to appear in Druid The first few slices (i.e graphs) provide monitoring related information (e.g. how many violations? Who are the violators? etc). The last 3 slices provide information about the predictions made by the model (i.e. which routes are predicted to have most violations? Which drivers are predicted to have violations) You can also create other slices and add them to the dashboard using the steps here Optionally you can also demonstrate how a data scientist would use archived truck events to build a model to predict violations. Note, to limit amount of resources needed to run the AMI, Spark/Hive has not been installed so you will not be able to actually run the notebook. The previous version of demokit HDP sandbox has these set up so that can be used if you want to actually execute the steps in the notebook. To walk through the trucking events analysis notebook, first open Zeppelin UI using the Quicklink from Ambari: Login as admin/admin Under Notebook section, use search text field to search for “Trucking data analysis” notebook using Zeppelin search: Click Save on the interpreter binding Walk through the notebook to show how data scientist can use SparkSQL to visualize data to help understand what features should be included in the model Finally you can show that once the important features are known, a model can be built to predict violations (in this case, using Logistical Regression) Stopping/Starting the simulator To stop the simulator, use below command to find its process id and then use kill command to kill it: ps -ef | grep stream-simulator kill <process_id> To start it back up, run below:cd /tmp/whoville/data_simulator/sudo ./runDataLoader.sh Sentiment Demo Video recording of the Sentiment demo Recording of setup instructions for demo provided here Sentiment Demo setup instructions Open Nifi UI using Quicklinks in Ambari Doubleclick "Twitter Dashboard" to open this process group: Right click "Grab Garden Hose" > Properties and enter your Twitter Consumer key/secret and Access token/secret. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Optionally change the 'Terms to filter on' as desired. Once complete, start the flow. Use Banana UI quicklink from Ambari to open Twitter dashboard An empty dashboard will initially appear. After a minute, you should start seeing charts appear Advanced Analytics Demo Video recording of Advanced Analytics demo Video recording provided here Advanced Analytics Demo setup instructions Open Zeppelin UI via Quicklink Login as admin. Password is same as Ambari password A directory structure containing a number of demo notebooks will appear. Find the earthquake demo notebook by filtering for ‘earthquake’ On first launch of a notebook, you will see that the "Interpreter Binding" settings will be displayed. You will need to click "Save" under the interpreter order to accept the defaults. Now you can walk through the notebook and show the visualizations and process of building the model. Note, to limit amount of resources needed to run the AMI, Spark/Hive has not been installed so you will not be able to actually run the notebook. The previous version of demokit or HDP sandbox has the notebook set up so that can be used if you want to actually execute the steps in the notebook. This concludes this article on how to launch the demo kit and access the provided demonstrations Questions? In case of questions or issues: 1. Search on our Hortonworks Community Connection forum. For example, to find all Demo Kit related posts access this url 2. If you were not able to find the solution, please post a new question using the tag “partner-demo-kit” here. Please try to be as descriptive as possible when asking questions by providing: Detailed description of problem Steps to reproduce problem Environment details e.g. Instance type used was m4.2xlarge Storage used was 500gb Etc Relevant log file snippets
... View more
- Find more articles tagged with:
- aws
- How-ToTutorial
- machine-learning
- partner-demo-kit
- Sandbox & Learning
- sentiment
- trucking
06-09-2017
12:20 AM
Thanks @Arti Wadhwani! yes looks like hard coding the hostname in the url will work. To hardcode only the hostname i.e to pick up the protocol (http vs https) and port automatically based on ambari configs, you can use something like below and then restarting ambari-server "url":"%@://nifi.server1.com:%@/nifi",
... View more
06-08-2017
07:14 PM
@Anishkumar Valsalam the quicklink functionality is defined in quicklinks.json in the Ambari service code for Nifi. For example for Nifi 1.0.0 you can find the json file here here (on your cluster it will be under /var/lib/ambari-server/resources/mpacks/hdf-ambari-mpack-*/common-services/NIFI/1.0.0/quicklinks/quicklinks.json). Based on the quicklinks.json, it is looking for nifi.node.port (or nifi.node.ssl.port, if SSL enabled) property in nifi-ambari-config config (which in Ambari > Nifi > Configs, shows up as the 'Advanced nifi-ambari-config' config accordion) to figure out which port the link should reference on the host(s) where Nifi was installed. Looking at the below section of the json where the URL is formed, it does not appear that you can have the quicklink point to a different host because it is using Ambari API to figure out which host(s) have Nifi installed (when user went through the install wizard) "url":"%@://%@:%@/nifi", I think the easiest way to achieve what you are looking for is probably to setup the hostname of the node where Nifi will run as nifi.server1.com instead of server1.com from the start i.e. prior to installing Ambari (although it is also possible to rename host post-install as well but is more involved)
... View more
04-25-2017
06:16 PM
Atlas is where data stewards can define tags. Ranger is where security admins can setup authorization policies for resources and tags. I suggest you go through the below webinar and tutorials to understand this better https://www.brighttalk.com/webcast/9573/237093/partnerworks-office-hours-dynamic-security-data-governance-in-hdp-2-5 https://hortonworks.com/hadoop-tutorial/tag-based-policies-atlas-ranger/ http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/ Answers: 1. Usually the hive policies work as a whitelist (allow conditions) ie deny access by default, except if there is at least one policy allowing access. In newer versions of Ranger, you can also do blacklist (ie deny conditions) which may be what you are looking for. See: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/ranger_tag_based_policy_manager.html 2. To enable users to login to Atlas web UI using AD credentials 3. Hive hook is capturing lineage info into Atlas (e.g. when use runs CTAS operation). More details here: http://atlas.incubator.apache.org/Bridge-Hive.html 4. Policy creation happens in Ranger not Atlas. Check Ranger docs.
... View more
01-23-2017
06:42 AM
@Karan Alang this is the error: User:root not allowed to do 'DECRYPT_EEK' on 'key1' Sounds like you may need to login to Ranger as keyadmin/keyadmin and create a policy that allows DECRYPT_EEK access for user root on the key called key1 Also the guide above was written back in HDP 2.3 timeframe. For HDP 2.5, you can refer to this guide: https://github.com/HortonworksUniversity/Security_Labs For HDP 2.4, there is an archive of above guide which can be downloaded here: https://github.com/HortonworksUniversity/Security_Labs/releases/tag/HDP-2.4
... View more
01-18-2017
08:02 PM
@Roger Young Sounds like root user is unable to create dir /root/hdp. As root user, can you manually try creating this dir and see what error you get? (e.g. mkdir /root/hdp). If you are running this on HDP 2.5 sandbox, note that it is docker based and we have encountered these type of filesystem issues (esp older versions of 2.5 sandbox)
... View more
01-16-2017
06:17 PM
@Roger Young The ambari service and instructions has been updated for hdp 2.5, (make sure to use hdp25 branch):
https://github.com/hortonworks-gallery/iotdemo-service/tree/hdp25 but I think it requires a lot more memory than is available on default sandbox The easiest way to get the demo up might be to use the prebuilt Amazon AMI to spin it up on AWS: https://community.hortonworks.com/articles/58330/automation-to-deploy-hdp-25nifi-10-clusters-runnin.html
... View more
01-14-2017
01:52 AM
@Karan Alang I haven't tried this myself but there are some helpful tips below. You would need to create the OpenTSDB instances with different table names but looks like in recent OpenTSDB versions, instead of being able to configure this via CLI argument, you would need to set this in the config properties file https://groups.google.com/forum/#!topic/opentsdb/nZ59_xMaRvo http://stackoverflow.com/questions/18951195/configure-multiple-opentsdb-to-use-single-hbase-backend
... View more
12-29-2016
10:22 PM
5 Kudos
Newer version available This article describes how to deploy HDP 2.5 version of demokit. While it can still be used, there is now a newer version of demokit available here that leverages combination of HDP 2.6/HDF 3.0 Overview
Partner demo kit is built and maintained by the Hortonworks Partner Solutions team. The purpose of the demo kit is to enable the partners to:
Quickly bring up a HDP environments with pre-built demos
Leverage available demos to understand the capabilities of the platform
Use the demos as part of business conversation to demonstrate the art of possible
The remainder of this article provides a short description of the 3 demos packaged within the demo kit and step by step instruction on:
How to launch the demo kit on AWS or on private cloud
How to execute the demos provided with the demo kit Pre-requisites
If using AWS, you must already have created your Amazon Web Services account. Sample steps for doing this can be found here. If you have an AWS promo code, you can apply it to your account using the steps here.
For running the sentiment demo, you must have created a Twitter application using your Twitter account and generated consumer keys/secrets. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Notes
Note that the partner demo kit is not a formally supported offering.
In case of questions, see ‘Questions?” section at the end of this article. Slides Slides for the demo kit are available here Webinar
Webinar recording about the demo kit available here Packaged Demos
The demo kit comes with 3 demos. The slide for these are available here :
1. IOT demo
Purpose: IOT demo showcases how a logistic company uses the Hortonworks Connected Data Platform to monitor its fleet in real time to mitigate driving infractions
Use case setup:
Sensor devices from trucks capture events of the trucks and actions of the drivers.
Some of these driver events are dangerous "events” such as: Lane Departure, Unsafe following distance, Unsafe tail distance
The Business Requirement is to stream these events in, filter on violations and do real-time alerting when “lots” of erratic behavior is detected for a given driver over a short period of time.
Over time, users would like to do advanced analytics on the full archive of historical events generated by the trucks to:
Determine what factors have an impact on driving violations (e.g. weather, driver fatigue etc)
Build an AI model to make predictions when violations will occur Technologies used: Apache Nifi, Kafka, Storm, HBase, Spark, Zeppelin More details available here and here
2. Sentiment demo
Purpose: Sentiment demo showcases how a retail company can use the Hortonworks Connected Data Platform to visualize and analyze social media data related to their products
Use case setup:
The Business Requirement is to capture, process and analyze flow of tweets to understand the social sentiments for their products Technologies used: Apache Nifi, Solr, Hive, HDFS More details available here and here
3. Advanced analytics demo
Purpose: Advanced analytics demo showcases how an insurance company can use the Hortonworks Connected Data Platform to visualize and make predictions on earthquake data using Apache Spark’s machine learning libraries
Use case setup:
The Business Requirement is to be able to perform advanced analytics on world wide earthquake data to predict where large earthquakes will happen so the business can accordingly modify insurance premiums
Technologies used: Apache Spark, Zeppelin
More details here Launching the Demo Kit Option 1: Launching the Demo Kit on AWS using AMI
1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab.
2. Select the Demo Kit AMI from ‘N. California’ region by clicking here. Now choose instance type: select ‘m4.2xlarge’ and click Next
Note: if you choose a smaller instance type from the above recommendation, not all services may come up
3. Configure Instance Details: leave the defaults and click ‘Next’
4. Add storage: keep the default of 500 GB and click ‘Next’
5. Optionally, add a name or any other tags you like. Then click ‘Next’
6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’
7. Review your settings and click Launch
8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’
9. Click the shown link under ‘Your instances are now launching’
10. This opens the EC2 dashboard that shows the details of your launched instance
11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster) and the ‘Owner’ id (which will be the default password). If the ‘Public IP’ is blank, wait 1-2 minutes for this to be populated
12. After about 20 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as admin user using your ‘Owner’ id as password (you can find your owner id in instance details page as highlighted above)
13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done.
14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up.
15. At this point you can follow the demo instructions. Troubleshooting
If any service does not come up for some reason, you can use Ambari to retry by clicking: ‘Actions’ > ‘Start all’.
In case of multiple failures when starting services, use the EC2 dashboard to double check that the correct instance type was used. Insufficient resources can cause services to not start up successfully
It is not required to connect via SSH to your instance. But you can do this using the key pair you created/selected earlier by following the standard instructions on AWS website. Make sure the user you login as is ec2-user
A log file of the automated startup of HDP services is available under: /var/log/hdp_startup.log
Logs of individual HDP components can be found under /var/log/<component name> or can be accessed using Logsearch UI available at http://<PUBLIC IP>:61888 (login with same credentials as Ambari)
Stopping/Terminating demo kit
Once you are done with demo kit, we recommend bringing it down to avoid incurring any unnecessary charges. To do this, follow below: First, stop the cluster services using Ambari by clicking: ‘Actions’ > ‘Stop all’. Then pick from one of the two options:
a) Terminate the instance: If you do not want to incur any further charges from AWS, terminate the VM instance from the same ‘EC2 dashboard’ that displayed the instance details. Note that this will destroy the VM, so the next time you wish to use demo kit, you will need to follow the same steps outlined in above section ‘Launching the Demo Kit’ b) Stop the instance: if you want to bring down your VM instance but keep it around so you can start it back up in the future, stop the VM instance from the EC2 dashboard. Note that this option will preserve any customizations you make to the VM but you will incur AWS charges by choosing for this option. More details on stop vs terminate operations can be found on AWS website here and here Option 2: Installing the Demo Kit on other setups
You can also install Demo Kit on other public or private clouds using the provided automated script.
1. Launch a CentOS/RHEL 6 or 7 instance on any cloud of your choice with at least 4 cores and 32 GB RAM. Make sure the instance has access to internet.
Warning: Do NOT run this script on an instance where Ambari or HDP has already been installed (including HDP sandbox)
2. SSH into the instance and run the below commands
export host_count=1
export stack=HDPDEMO
export ambari_password=BadPass#1 ## change password as needed
curl -sSL https://gist.github.com/abajwa-hw/3f2e211d252bba6cad6a6735f78a4a93/raw | sudo -E sh
3. This will install Ambari server and agents. After 5-10 min, you should get a message saying the blueprint was deployed which means cluster install has started. At this point you can login to Ambari UI on port 8080 (using user: admin and whatever password you specified above) and monitor the cluster install/startup Demo Execution Steps IOT Demo Video recording of the IOT demo
Recording of demo provided here
PPT and PDF versions of the slides also available
Recording of setup instructions for demo provided here IOT Demo setup instructions
Part 1: We will be using the IOT trucking web application to deploy a Storm topology. Then we will use Nifi to push simulated trucking events into Kafka where they will be pulled by Storm for windowing analysis, before being pushed out to a web app and HBase
In Ambari, open 'IotDemo UI' using quicklink:
In IotDemo UI, click "Deploy the Storm Topology"
After 30-60 seconds, the topology will be deployed. Confirm using the Storm View in Ambari:
Click "Truck Monitoring Application" link in 'IotDemo UI' to open the monitoring app showing an empty map.
Click 'Nifi Data Flow' in In IotDemo UI to launch Nifi and then double click on 'Iot Trucking demo' processor group. The flow should already be started so no action needed.
In Ambari, click "Generate Events" to simulate 50 events (this can be configured)
Switch back to "Truck Monitoring Application" in IotDemo UI and after 30s the trucking events will appear on screen
Examine the Storm topology using Storm View in Ambari
Part 2: Next, you can run through the Trucking data Zeppelin notebook to do advanced analytics on the full archive of historical events generated by the trucks to
Determine what factors have an impact on driving violations (e.g. weather, driver fatigue etc)
Build an AI model to make predictions when violations will occur
Login to Zeppelin interface using the steps provided under the below section: ‘Advanced Analytics Demo setup instructions’
Find and open the ‘Trucking Data Analysis’ notebook by filtering for ‘truck’
On first launch of a notebook, you will see that the "Interpreter Binding" settings will be displayed. You will need to click "Save" under the interpreter order to accept the defaults.
Execute the code cells one by one, by clicking the 'Play' (triangular) button on top right of each cell. Alternatively you can just highlight a cell then press Shift-Enter
You can tell that the status of the cell is RUNNING by the label on the top right of the cell. Note that the first invocation of a cell that runs Spark takes 30-60 seconds as the Spark Application Master is launched on YARN. If desired, you can monitor this using YARN’s Resource Manager UI and Spark UI (for detailed steps, see the below section ‘Advanced Analytics Demo setup instructions’) Sentiment Demo Video recording of the Sentiment demo
Recording of setup instructions for demo provided here Slides available here Sentiment Demo setup instructions
Open Nifi UI using Quicklinks in Ambari
Doubleclick "Twitter Dashboard" to open this process group:
Right click "Grab Garden Hose" > Properties and enter your Twitter Consumer key/secret and Access token/secret. If you do not have these, you can generate a new set using your Twitter account by following this section of the Hortonworks tutorial. Optionally change the 'Terms to filter on' as desired. Once complete, start the flow.
Use Banana UI quicklink from Ambari to open Twitter dashboard
An empty dashboard will initially appear. After a minute, you should start seeing charts appear
Use Hive view in Ambari run SQL queries on tweet data
Advanced Analytics Demo Video recording of Advanced Analytics demo
Video recording provided here Slides provided here Advanced Analytics Demo setup instructions
Open Zeppelin UI via Quicklink
Login as admin. Password is same as Ambari password
A directory structure containing a number of demo notebooks will appear.
Find the earthquake demo notebook by filtering for ‘earthquake’
On first launch of a notebook, you will see that the "Interpreter Binding" settings will be displayed. You will need to click "Save" under the interpreter order to accept the defaults.
Execute the code cells one by one, by clicking the 'Play' (triangular) button on top right of each cell. Alternatively you can just highlight a cell then press Shift-Enter
You can tell that the status of the cell is RUNNING by the label on the top right of the cell.
Note that the first invocation of a cell that runs Spark takes 30-60 seconds as the Spark context is launched. Under the covers it is launching a Spark Application Master on YARN. If desired, you can monitor this using Resource Manager UI which is available through Ambari under Yarn > Quicklink.
The Spark UI can also be access from Resource Manager UI by clicking on application ID for Zeppelin app and then click on ‘Application Master’ hyperlink under ‘Tracking Url’
The Spark UI can be used to monitor the running Spark jobs
’
This concludes this article on how to launch the demo kit and access the provided demonstrations Questions?
In case of questions or issues:
1. Search on our Hortonworks Community Connection forum. For example, to find all Demo Kit related posts access this url
2. If you were not able to find the solution, please post a new question using the tag “partner-demo-kit” here. Please try to be as descriptive as possible when asking questions by providing:
Detailed description of problem
Steps to reproduce problem
Environment details e.g.
Instance type used was m4.2xlarge
Storage used was 500gb
Etc Relevant log file snippets
... View more
- Find more articles tagged with:
- aws
- How-ToTutorial
- machine-learning
- partner-demo-kit
- Sandbox & Learning
- sentiment
- trucking
12-07-2016
06:41 PM
1 Kudo
@Shashank Rai Not currently supported but planned for 3.0: See https://issues.apache.org/jira/browse/AMBARI-19109
... View more
11-11-2016
09:46 PM
thanks @slachterman! this resolved the problem for me
... View more
11-07-2016
04:17 PM
Thanks for the feedback @Amod Gehlot. I have updated the original HCC article with this info for others as well
... View more
11-02-2016
06:56 PM
3 Kudos
One thing to point out: the "java.net.ConnectException: Connection refused" error is not related. It just means that Ambari metrics service is probably not started - it's usually turned off by default on sandbox, to conserve resources From the screenshots it seems the flow was started and tweets are flowing. If you can not query them in Solr UI/APIs, I would check if the PutSolrContentStream processor in Nifi is showing any errors - sometimes the zookeeper zknode that solr is using may not be correctly specified in its settings. In this example, I used the Solr ambari service which usually sets up Solr to use /solr zknode If you are able to query tweets in Solr, but not in Banana try to switch the "Time Window" or re-installing the .json file for the dashboard
... View more
10-25-2016
04:38 PM
@Amod Gehlot sorry you weren't able to get past this. Couple of options: 1. If you are ok with 2.4 version of sandbox its available for download at http://hortonworks.com/downloads/ (search for "sandbox archive") 2. If you are ok with paying for instance on AWS, you can spin up single node of HDP 2.5 from AMI. (This is not official sandbox, but a image of single node cluster that has some prebuilt demos). Details here: https://community.hortonworks.com/articles/58330/automation-to-deploy-hdp-25nifi-10-clusters-runnin.html 3. Install a Centos VM on your local machine and install HDP via Ambari. Ambari-bootstrap automation can do this for you pretty painlessly. Sample usage: https://gist.github.com/abajwa-hw/55cd937fc8c5e27b8f2ec8c506d86519 4. Wait for the HDP 2.5 sandbox to refreshed (hopefully in new week or two)
... View more
10-19-2016
08:10 PM
1 Kudo
Not currently supported
... View more
10-19-2016
08:05 AM
6 Kudos
@Amod Gehlot: This is due to a docker issue in this 2.5 sandbox build. It will be fixed in next revision of the sandbox. In the meantime, try running the below command and then re-try running 'Add service' wizard. If you get the error, close the wizard and then re-launch it (you may need to do this a few times) sudo rm -rf /var/run/ambari-server/stack-recommendations/* Usually by the 6th time, it will pick up a dir which doesn't exist and the error will not be seen
... View more
10-06-2016
09:58 AM
2 Kudos
In the previous articles, we showed how to deploy an HDF 2.x/3.0 cluster, enable SSL for Nifi and setup the Ranger Nifi plugin. Here we will build on the same cluster and show how to enable kerberos using Active Directory. Summary To achieve this, the high level steps we will follow are:
Setup certificate trust for HDF nodes Run Ambari security wizard Create Ranger policy for nifiadmin user Delete certificate Login to Nifi using AD principal credentials Pre-requisites You have correctly setup AD as described here
Active Directory setup with domain: CLOUD.HORTONWORKS.COM AD already preconfigured with LDAPS Certificate (.crt) used to enable LDAPS is available OU created where HDF principals will be created hadoop user has permission to write principals to above OU nifiadmin user created in AD (optionally synced over to Ranger)
Test to ensure you can access AD over LDAPS using hadoopadmin user succeeds: ldapsearch -H ldaps://sme-security-ad03.cloud.hortonworks.com:636 -D hadoopadmin@cloud.hortonworks.com -w BadPass#1 Steps 1. Setup trust for all HDF nodes using the AD certificate #run on all HDF nodes before running security wizard using AD
ad_ip=xx.xx.xx.xx ##replace with IP of your AD
cert_url=http://someurl/mycertificate.crt ## replace with location of exported AD certificate
echo "${ad_ip} ad01.lab.hortonworks.net ad01" | sudo tee -a /etc/hosts
sudo yum -y install openldap-clients ca-certificates
#instead of downloading the cert, you could also manually transfer the .cert file to below location
sudo curl -sSL "${cert_url}" -o /etc/pki/ca-trust/source/anchors/hortonworks-net.crt
sudo update-ca-trust force-enable
sudo update-ca-trust extract
sudo update-ca-trust check
# edit /etc/openldap/ldap.conf to include LDAP url and base
sudo tee -a /etc/openldap/ldap.conf > /dev/null << EOF
TLS_CACERT /etc/pki/tls/cert.pem
URI ldaps://ad01.lab.hortonworks.net ldap://ad01.lab.hortonworks.net
BASE dc=cloud,dc=hortonworks,dc=com
EOF
#test using openssl - should return 0
openssl s_client -connect ad01:636 </dev/null
#test using ldapsearch
ldapsearch -H ldaps://sme-security-ad03.cloud.hortonworks.com:636 -D nifiadmin@cloud.hortonworks.com -w BadPass#1 2. Run Ambari Security Wizard Launch security wizard via Ambari (under Admin > Kerberos) and enter below: The ‘Configure Kerberos’ page is the only one you will need to update. Enter the below then click Next on all remaining screens.
KDC host: FQDN of AD Realm name: CLOUD.HORTONWORKS.COM Kadmin host: FQDN of AD node Admin principal: hadoopadmin@cloud.hortonworks.com Password: BadPass#1 On ‘Configure Identities’ page, users will be shown the option to customize the keytabs/principals for all components: The Nifi ones are under Advanced tab: Click Next to proceed using the default keytab/principal names Click Next to proceed through all remaining steps of the wizard. What’s happening to Nifi under the covers when security wizard runs? a) NiFi principal and keytabs will be automatically be created/distributed across the cluster where needed by Ambari b) Kerberos-related nifi.properties fields will automatically be updated:
NiFi.kerberos.service.principal NiFi.kerberos.keytab.location NiFi.kerberos.krb5.file NiFi.kerberos.authentication.expiration c) Login provider will also be switched to kerberos under the covers d) As part of the process, other HDF components were also kerberized including ‘Ambari Infra’ service. This mean that Ranger audits are now being written to kerberized Solr After security wizard completes, NiFi’s kerberos details will appear alongside other components (under Admin > Kerberos). At this point, Kerberos security will be enabled for all components running on the cluster: On a node running Nifi, you can verify the keytab was generated and list its principal # klist -kt /etc/security/keytabs/nifi.service.keytab
Keytab name: FILE:/etc/security/keytabs/nifi.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
1 09/28/2016 04:55:08 nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM You can also verify the nifi configs for kerberos were automatically populated: # cat /etc/nifi/conf/nifi.properties | grep kerberos
nifi.kerberos.krb5.file=/etc/krb5.conf
nifi.kerberos.service.keytab.location=/etc/security/keytabs/nifi.service.keytab
nifi.kerberos.service.principal=nifi/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
nifi.kerberos.spnego.authentication.expiration=12 hours
nifi.kerberos.spnego.keytab.location=/etc/security/keytabs/spnego.service.keytab
nifi.kerberos.spnego.principal=HTTP/abajwa-hdf-qe-hdfsecured-1.openstacklocal@CLOUD.HORTONWORKS.COM
nifi.security.user.login.identity.provider=kerberos-provider You can also verify that the login-identity-provider or Nifi has now been switched to kerberos # tail /etc/nifi/conf/login-identity-providers.xml
<provider>
<identifier>kerberos-provider</identifier>
<class>org.apache.nifi.kerberos.KerberosProvider</class>
<property name="Default Realm">HORTONWORKS.COM</property>
<property name="Authentication Expiration">12 hours</property>
</provider> 3. Login to Nifi UI without certificate Now that kerberos is enabled, lets try to login without using certificate
Make sure nifiadmin user exists in Ranger (if you ran Ranger sync earlier this should have been imported already).
If not, create the user in Ranger by navigating to below url and entering below http://<Ranger_node>:6080/index.html#!/user/create
Create Ranger policy for new user
In Ranger, under ‘Access Manager, click ‘HDF-nifi’
Click Edit button on the /* policy we previously added nifiadmin@CLOUD.HORTONWORKS.COM to
Add the newly created nifiadmin user to the policy, and click Save
Delete previously imported .p12 certificates from your browser
e.g. if using Chrome on OSX you can delete previously imported certificates using ‘Keychain Access’ application
Restart Chrome and open Nifi UI. It should now display a login page
If not, try opening “Incognito Window”
Enter username as nifiadmin and the password you set
The Nifi UI should open now and you will be logged in as that user
You can see who you are logged in as by checking top-right corner of Nifi UI This completes the tutorial. If you made it this far in the series, congratulations! You have successfully:
Deployed HDF 2.0 Enabled SSL for Nifi and explored file-based authorization for Nifi Installed Ranger and switched to Ranger-based authorization for Nifi Enabled kerberos for your HDF cluster using Active Directory Logged into Nifi using AD credentials
... View more
- Find more articles tagged with:
- active-directory
- Data Ingestion & Streaming
- hdf
- How-ToTutorial
- Kerberos
- NiFi
- Ranger
- Security