Member since
09-17-2015
436
Posts
736
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3605 | 01-14-2017 01:52 AM | |
5611 | 12-07-2016 06:41 PM | |
6423 | 11-02-2016 06:56 PM | |
2112 | 10-19-2016 08:10 PM | |
5548 | 10-19-2016 08:05 AM |
10-30-2023
02:24 AM
@nitishgoyal13 as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
08-10-2023
12:34 AM
Hi Team, Can any one help to resolve this issue, Resources manager wnet down due to this not able start them. Error:- Service did not start successfully; not all of the required roles started: only 23/25 roles started. Reasons : Service has only 0 ResourceManager roles running instead of minimum required 1.
... View more
08-21-2022
10:31 PM
1 Kudo
@mike_bronson7, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
11-17-2021
01:50 PM
1 Kudo
This article will show you how to interact with Atlas APIs in CDP-public to create tags and associate tags with entities (in preparation for use with Ranger's tag based policies)
In Cloudera CDP-public offering, Apache Atlas is a part of SDX DataLake cluster that is created when you create your first Environment:
Introduction to Data Lakes
Pre-requisites
A. First, you will need to find the Atlas endpoint using the Cloudera CDP management console:
Accessing Data Lake services
Sample Atlas endpoint: https://pse-722-cdp-xxxxx.cloudera.site/pse-722-cdp-dl/cdp-proxy-api/atlas/api/atlas/
B. Next, you will need to set your user's workload password
Setting the workload password
Now you can use the following sample bash code to interact with Atlas APIs from a CentOS instance outside CDP:
From Atlas endpoint, you can extract the first 2 params below. You will also need to set your username and password:
export datalake_name='pse-722-cdp-dl'
export lake_ip='pse-722-cdp-xxxxx.cloudera.site'
export user='abajwa'
export password='nicepassword'
export atlas_curl="curl -k -u ${user}:${password}"
export atlas_url="https://${lake_ip}:443/${datalake_name}/cdp-proxy-api/atlas/api/atlas"
After forming the above variables, you can use them to run some basic GET and POST commands to import tags and glossary into Atlas.
#test API by fetching Atlas typedefs
${atlas_curl} ${atlas_url}/v2/types/typedefs
#download sample Glossary
wget https://github.com/abajwa-hw/masterclass/blob/master/ranger-atlas/HortoniaMunichSetup/data/export-glossary.zip
#import sample Glossary into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: multipart/form-data" -H "Cache-Control: no-cache" -F data=@export-glossary.zip ${atlas_url}/import
#import sample tags
wget https://github.com/abajwa-hw/masterclass/raw/master/ranger-atlas/HortoniaMunichSetup/data/classifications.json
#import sample tags into Atlas
curl -v -k -X POST -u ${user}:${password} -H "Accept: application/json" -H "Content-Type: application/json" ${atlas_url}/v2/types/typedefs -d @classifications.json
At this point, you should be able to see the newly imported tags and glossary entities in your Atlas UI.
Next, you can search for any Hive entity (this should get automatically created in Atlas when the Hive table is created) and associate it with a tag.
#find airlines_new_orc.airports entity in Atlas
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm
#fetch guid for airlines_new_orc.airports
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | jq '.entity.guid' | tr -d '"')
#use guid to associate a tag REFERENCE_DATA to airlines_new_orc.airports entity
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}'
#confirm now entity shows REFERENCE_DATA tag (also will be visible via UI)
${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=airlines_new_orc.airports@cm | grep REFERENCE_DATA
Now that you have entities tagged with a tag, you can use Ranger to create a "tag-based policy".
Tag-based Services and Policies
Other sample code to associate tags Atlas: How to automate associating tags/classifications to HDFS/Hive/HBase/Kafka entities using REST APIs
... View more
06-16-2021
07:30 AM
Hi abajwa, I tried to install VNC using the following link, however, getting 11 errors which was posted here before. stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/VNCSERVER/package/scripts/master.py", line 132, in <module> Master().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute method(env) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/VNCSERVER/package/scripts/master.py", line 31, in install Execute('yum groupinstall -y Desktop >> '+params.log_location) File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run returns=self.resource.returns) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of 'yum groupinstall -y Desktop >> /var/log/vnc-stack.log' returned 1. There is no installed groups file. Maybe run: yum groups mark convert (see man yum) http://s3.amazonaws.com/dev.hortonworks.com/DAS/centos7/1.x/BUILDS/1.0.2.0-6/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden Trying other mirror. To address this issue please refer to the below wiki article https://wiki.centos.org/yum-errors If above article doesn't help to resolve this issue please use https://bugs.centos.org/. http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.0.1.0/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden Trying other mirror. http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.22/repos/centos7/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden Trying other mirror. http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.7.1.0/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden Trying other mirror. Warning: group Desktop does not exist. Maybe run: yum groups mark install (see man yum) Error: No packages in any requested group available to install or update stdout: 2021-06-16 14:15:21,973 - Stack Feature Version Info: Cluster Stack=3.0, Command Stack=None, Command Version=None -> 3.0 2021-06-16 14:15:21,982 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2021-06-16 14:15:21,986 - Group['livy'] {} 2021-06-16 14:15:21,991 - Group['spark'] {} 2021-06-16 14:15:21,991 - Group['ranger'] {} 2021-06-16 14:15:21,992 - Group['hdfs'] {} 2021-06-16 14:15:21,992 - Group['zeppelin'] {} 2021-06-16 14:15:21,992 - Group['hadoop'] {} 2021-06-16 14:15:21,993 - Group['users'] {} 2021-06-16 14:15:21,993 - Group['knox'] {} 2021-06-16 14:15:21,994 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:21,997 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:21,999 - User['storm'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,005 - User['infra-solr'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,007 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,010 - User['superset'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,012 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None} 2021-06-16 14:15:22,014 - User['atlas'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,018 - User['ranger'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['ranger', 'hadoop'], 'uid': None} 2021-06-16 14:15:22,020 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None} 2021-06-16 14:15:22,021 - User['zeppelin'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['zeppelin', 'hadoop'], 'uid': None} 2021-06-16 14:15:22,022 - User['livy'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['livy', 'hadoop'], 'uid': None} 2021-06-16 14:15:22,026 - User['druid'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,028 - User['spark'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['spark', 'hadoop'], 'uid': None} 2021-06-16 14:15:22,030 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None} 2021-06-16 14:15:22,033 - User['kafka'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,035 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop'], 'uid': None} 2021-06-16 14:15:22,037 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,038 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,040 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,042 - User['hbase'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None} 2021-06-16 14:15:22,044 - User['knox'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'knox'], 'uid': None} 2021-06-16 14:15:22,046 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2021-06-16 14:15:22,058 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2021-06-16 14:15:22,081 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 0'] due to not_if 2021-06-16 14:15:22,082 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'create_parents': True, 'mode': 0775, 'cd_access': 'a'} 2021-06-16 14:15:22,087 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2021-06-16 14:15:22,090 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2021-06-16 14:15:22,093 - call['/var/lib/ambari-agent/tmp/changeUid.sh hbase'] {} 2021-06-16 14:15:22,149 - call returned (0, '1015') 2021-06-16 14:15:22,151 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1015'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'} 2021-06-16 14:15:22,169 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase 1015'] due to not_if 2021-06-16 14:15:22,170 - Group['hdfs'] {} 2021-06-16 14:15:22,170 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop', u'hdfs']} 2021-06-16 14:15:22,172 - FS Type: HDFS 2021-06-16 14:15:22,172 - Directory['/etc/hadoop'] {'mode': 0755} 2021-06-16 14:15:22,214 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2021-06-16 14:15:22,216 - Writing File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] because contents don't match 2021-06-16 14:15:22,218 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777} 2021-06-16 14:15:22,241 - Repository['DAS-1.0.2.0-6-repo-1'] {'base_url': 'http://s3.amazonaws.com/dev.hortonworks.com/DAS/centos7/1.x/BUILDS/1.0.2.0-6', 'action': ['prepare'], 'components': [u'dasbn-repo', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'ambari-hdp-1', 'mirror_list': None} 2021-06-16 14:15:22,258 - Repository['HDP-UTILS-1.1.0.22-repo-1'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.22/repos/centos7', 'action': ['prepare'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'ambari-hdp-1', 'mirror_list': None} 2021-06-16 14:15:22,262 - Repository with url http://public-repo-1.hortonworks.com/HDP-GPL/centos7/3.x/updates/3.0.1.0 is not created due to its tags: set([u'GPL']) 2021-06-16 14:15:22,262 - Repository['HDP-3.0-repo-1'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.0.1.0', 'action': ['prepare'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'ambari-hdp-1', 'mirror_list': None} 2021-06-16 14:15:22,267 - Repository[None] {'action': ['create']} 2021-06-16 14:15:22,268 - File['/tmp/tmpxKCqje'] {'content': '[DAS-1.0.2.0-6-repo-1]\nname=DAS-1.0.2.0-6-repo-1\nbaseurl=http://s3.amazonaws.com/dev.hortonworks.com/DAS/centos7/1.x/BUILDS/1.0.2.0-6\n\npath=/\nenabled=1\ngpgcheck=0\n[HDP-UTILS-1.1.0.22-repo-1]\nname=HDP-UTILS-1.1.0.22-repo-1\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.22/repos/centos7\n\npath=/\nenabled=1\ngpgcheck=0\n[HDP-3.0-repo-1]\nname=HDP-3.0-repo-1\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos7/3.x/updates/3.0.1.0\n\npath=/\nenabled=1\ngpgcheck=0'} 2021-06-16 14:15:22,271 - Writing File['/tmp/tmpxKCqje'] because contents don't match 2021-06-16 14:15:22,273 - File['/tmp/tmpMxNzOX'] {'content': StaticFile('/etc/yum.repos.d/ambari-hdp-1.repo')} 2021-06-16 14:15:22,281 - Writing File['/tmp/tmpMxNzOX'] because contents don't match 2021-06-16 14:15:22,283 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2021-06-16 14:15:22,665 - Skipping installation of existing package unzip 2021-06-16 14:15:22,665 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2021-06-16 14:15:22,680 - Skipping installation of existing package curl 2021-06-16 14:15:22,681 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2021-06-16 14:15:22,695 - Skipping installation of existing package hdp-select 2021-06-16 14:15:22,702 - The repository with version 3.0.1.0-187 for this command has been marked as resolved. It will be used to report the version of the component which was installed 2021-06-16 14:15:22,708 - Skipping stack-select on VNC because it does not exist in the stack-select package structure. 2021-06-16 14:15:22,966 - Execute['echo "installing Desktop" >> /var/log/vnc-stack.log'] {} 2021-06-16 14:15:22,972 - Execute['yum groupinstall -y Desktop >> /var/log/vnc-stack.log'] {} 2021-06-16 14:15:54,364 - The repository with version 3.0.1.0-187 for this command has been marked as resolved. It will be used to report the version of the component which was installed
... View more
04-22-2021
01:59 PM
@abajwa Hi, thanks for your help in the past. Now I have a new question: I want to try adding the Amundsen open source data catalog to the environment to see how it exposes all the datasets that you've populated. It depends on the availability of LDAP or similar to recognize the user who's viewing the data in the system. Is there some local LDAP or other identity service included in this demo environment? Thanks for any pointers, -Antonio
... View more
01-27-2021
12:00 AM
Hi @abajwa, Does the Ambari Server Host also need to present it's own SSL certificate to the AD server? In case of multiple domain controllers, do we need to have separate SSL certificates from each of the domain controllers? Thanks, Megh
... View more
01-26-2020
11:18 AM
This worked! I already made these changes prior to running the last command. hdp-select status hadoop-client Set a couple of parameters export HADOOP_OPTS="-Dhdp.version=2.6.1.0-129” export HADOOP_CONF_DIR=/etc/hadoop/conf Source-in the environment source ~/get_env.sh Included last two lines to $SPARK_HOME/conf/spark-defaults.conf spark.driver.extraJavaOptions -Dhdp.version=2.6.1.0-129 spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.1.0-129 Added Hadoop version under Ambari / Yarn / Advanced / Custom: hdp.version=2.6.1.0-129 Ensure this runs okay yarn jar hadoop-mapreduce-examples.jar pi 5 5 Run spark pi example under yarn cd /home/spark/spark-2.4.4-bin-hadoop2.7 spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 5 --executor-cores 2 --conf spark.authenticate.enableSaslEncryption=true --conf spark.network.sasl.serverAlwaysEncrypt=true --conf spark.authenticate=true examples/jars/spark-examples_2.11-2.4.4.jar 100
... View more
09-08-2018
05:52 PM
6 Kudos
Summary: The release of HDF 3.3 brings about a significant number of improvements in HDF. This article shows how you can use ambari-bootstrap project to easily generate a blueprint and deploy either HDF only clusters or combined HDP/HDF clusters in 5 easy steps. To quickly setup a single node setup, prebuilt AMIs are available for AWS as well as a script that automates these steps, so you can deploy the cluster in a few commands. Steps for each of the below option are described in this article: A. Single-node prebuilt AMIs on AWS B. Single-node fresh installs C. Multi-node fresh installs A. Single-node prebuilt AMI on AWS: Steps to launch the AMI 1. Launch Amazon AWS console page in your browser by clicking here and sign in with your credentials. Once signed in, you can close this browser tab. 2. Select the AMI from ‘N. California’ region by clicking one of the below options To spin up HDP 3.1/HDF 3.3, click here To spin up HDF 3.3 only cluster, click here Now choose instance type: select ‘m4.2xlarge’ and click Next Note: if you choose a smaller instance type from the above recommendation, not all services may come up 3. Configure Instance Details: leave the defaults and click ‘Next’ 4. Add storage: keep at least the default of 800 GB and click ‘Next’ 5. Optionally, add a name or any other tags you like. Then click ‘Next’ 6. Configure security group: create a new security group and select ‘All traffic’ to open all ports. For production usage, a more restrictive security group policy is strongly encouraged. As an instance only allow traffic from your company’s IP range. Then click ‘Review and Launch’ 7. Review your settings and click Launch 8. Create and download a new key pair (or choose an existing one). Then click ‘Launch instances’ 9. Click the shown link under ‘Your instances are now launching’ 10. This opens the EC2 dashboard that shows the details of your launched instance 11. Make note of your instance’s ‘Public IP’ (which will be used to access your cluster). If it is blank, wait 1-2 minutes for this to be populated. 12. After 5-10 minutes, open the below URL in your browser to access Ambari’s console: http://<PUBLIC IP>:8080. Login as user:admin and pass:StrongPassword (see previous step) 13. At this point, Ambari may still be in the process of starting all the services. You can tell by the presence of the blue ‘op’ notification near the top left of the page. If so, just wait until it is done. (Optional) You can also monitor the startup using the log as below: Open SSH session into the VM using your key and the public IP e.g. from OSX: ssh -i ~/.ssh/mykey.pem centos@<publicIP> Tail the startup log: tail -f /var/log/hdp_startup.log Once you see “cluster is ready!” you can proceed 14. Once the blue ‘op’ notification disappears and all the services show a green check mark, the cluster is fully up. B. Single-node install: Launch a fresh CentOS/RHEL 7 instance with 4+cpu and 16GB+ RAM and run below. Do not try to install HDF on a env where Ambari or HDP are already installed (e.g. HDP sandbox or HDP cluster) To deploy HDF 3.3 only cluster, run below export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/b5565d7e7f9beffd8dd57a970dc54266/raw | sudo -E sh To deploy HDF 3.3/HDP3.1 combined cluster, run below export host_count=1
curl -sSL https://gist.github.com/abajwa-hw/d7cd1c0232c1af46ee2c465e4871ddc6/raw | sudo -E sh Once launched, the script will install Ambari and use it to deploy HDF cluster Note: this script can also be used to install multi-node clusters after step #1 below is complete (i.e. after the agents on non-AmabriServer nodes are installed and registered). Just change the value of the host_count variable C. Multi-node HDF 3.3 install: 0. Launch your RHEL/CentOS 7 instances where you wish to install HDF. In this example, we will use 4 m4.xlarge instances. Select an instance where ambari-server should run (e.g. node1) 1. After choosing a host where you would like Ambari-server to run, first let's prepare the other hosts. Run below on all hosts where Ambari-server will not be running (e.g. node2-4). This will run pre-requisite steps, install Ambari-agents and point them to Ambari-server host: export ambari_server=<FQDN of host where ambari-server will be installed>;#replace this
export install_ambari_server=false
export ambari_version=2.7.3.0
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; 2. Run remaining steps on host where Ambari-server is to be installed (e.g. node1). The below commands run pre-reqs and install Ambari-server export db_password="StrongPassword" # MySQL password
export nifi_password="StrongPassword" # NiFi password must be at least ten chars
export hdf_ambari_mpack_url="http://public-repo-1.hortonworks.com/HDF/amazonlinux2/3.x/updates/3.3.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.3.0.0-165.tar.gz"
export ambari_version=2.7.3.0
#install bootstrap
yum install -y git python-argparse
cd /tmp
git clone https://github.com/seanorama/ambari-bootstrap.git
#Runs pre-reqs and install ambari-server
export install_ambari_server=true
curl -sSL https://raw.githubusercontent.com/seanorama/ambari-bootstrap/master/ambari-bootstrap.sh | sudo -E sh ; 3. On the same node, install MySQL and create databases and users for Schema Registry and SAM sudo yum localinstall -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
sudo yum install -y epel-release mysql-connector-java* mysql-community-server # MySQL Setup
sudo systemctl enable mysqld.service
sudo systemctl start mysqld.service
#extract system generated Mysql password
oldpass=$( grep 'temporary.*root@localhost' /var/log/mysqld.log | tail -n 1| sed 's/.*root@localhost: //')
#create sql file that
# 1. reset Mysql password to temp value and create druid/superset/registry/streamline schemas and users
# 2. sets passwords for druid/superset/registry/streamline users to ${db_password}
cat << EOF > mysql-setup.sql
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Secur1ty!';uninstall plugin validate_password;CREATE DATABASE registry DEFAULT CHARACTER SET utf8; CREATE DATABASE streamline DEFAULT CHARACTER SET utf8;CREATE USER 'registry'@'%' IDENTIFIED BY '${db_password}'; CREATE USER 'streamline'@'%' IDENTIFIED BY '${db_password}';GRANT ALL PRIVILEGES ON registry.* TO 'registry'@'%' WITH GRANT OPTION ; GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION ;commit;
EOF
#execute sqlfile
mysql -h localhost -u root -p"$oldpass" --connect-expired-password < mysql-setup.sql
#change Mysql password to StrongPassword
mysqladmin -u root -p'Secur1ty!' password StrongPassword
#test password and confirm dbs created
mysql -u root -pStrongPassword -e 'show databases;' 4. On the same node, install Mysql connector jar and then HDF mpack. Then restart Ambari so it recognizes HDF stack: sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
sudo ambari-server install-mpack --mpack=${hdf_ambari_mpack_url} --verbose
sudo ambari-server restart At this point, if you wanted you could use Ambari install wizard to install HDF you can do that as well. Just open http://<Ambari host IP>:8080 and login and follow the steps in the doc. Otherwise, to proceed with deploying via blueprints follow the remaining steps. 4. On the same node, provide minimum configurations required for install by creating configuration-custom.json. You can add to this to customize any component's property that is exposed by Ambari cd /tmp/ambari-bootstrap/deploy
cat << EOF > configuration-custom.json
{
"configurations": {
"ams-grafana-env": {
"metrics_grafana_password": "${ambari_password}"
},
"kafka-broker": {
"offsets.topic.replication.factor": "1"
},
"streamline-common": {
"jar.storage.type": "local",
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/streamline",
"registry.url" : "http://localhost:7788/api/v1",
"streamline.dashboard.url" : "http://localhost:9089",
"streamline.storage.connector.password": "${db_password}"
},
"registry-common": {
"jar.storage.type": "local",
"registry.storage.connector.connectURI": "jdbc:mysql://$(hostname -f):3306/registry",
"registry.storage.type": "mysql",
"registry.storage.connector.password": "${db_password}"
},
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "${nifi_password}"
},
"nifi-registry-properties": {
"nifi.registry.db.password": "${nifi_password}"
},
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "${nifi_password}"
}
}
}
EOF 5. Then run below as root to generate a recommended blueprint and deploy the cluster install. Make sure to set host_count to the total number of hosts in your cluster (including Ambari server) sudo su
cd /tmp/ambari-bootstrap/deploy/
export host_count=<Number of total nodes>
export ambari_stack_name=HDF
export ambari_stack_version=3.3
export cluster_name="HDF"
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY AMBARI_METRICS KNOX"
./deploy-recommended-cluster.bash You can now login into Ambari at http://<Ambari host IP>:8080 and sit back and watch your HDF cluster get installed! Notes: a) This will only install Nifi on a single node of the cluster by default b) Nifi Certificate Authority (CA) component will be installed by default. This means that if you wanted to, you could enable SSL to be enabled for Nifi out of the box by including a "nifi-ambari-ssl-config" section in the above configuration-custom.json: "nifi-ambari-ssl-config":{
"nifi.toolkit.tls.token":"hadoop",
"nifi.node.ssl.isenabled":"true",
"nifi.security.needClientAuth":"true",
"nifi.toolkit.dn.suffix":", OU=HORTONWORKS",
"nifi.initial.admin.identity":"CN=nifiadmin, OU=HORTONWORKS",
"content":"<property name='Node Identity 1'>CN=node-1.fqdn, OU=HORTONWORKS</property><property name='Node Identity 2'>CN=node-2.fqdn, OU=HORTONWORKS</property><property name='Node Identity 3'>node-3.fqdn, OU=HORTONWORKS</property>"
}, Make sure to replace node-x.fqdn with the FQDN of each node running Nifi c) As part of the install, you can also have an existing Nifi flow deployed by Ambari. First, read in a flow.xml file from existing Nifi system (you can find this in flow.xml.gz). For example, run below to read the flow for the Twitter demo into an env var twitter_flow=$(curl -L https://gist.githubusercontent.com/abajwa-hw/3a3e2b2d9fb239043a38d204c94e609f/raw) Then include a "nifi-ambari-ssl-config" section in the above configuration-custom.json when you run the tee command - to have ambari-bootstrap include the whole flow xml into the generated blueprint: "nifi-flow-env":{
"properties_attributes":{},
"properties":{"content":"${twitter_flow}"}
} d) In case you would like to review the generated blueprint before it gets deployed, just set the below variable as well: export deploy=false .... The blueprint will be created under /tmp/ambari-bootstrap*/deploy/tempdir*/blueprint.json Sample blueprints Sample generated blueprint for 4 node HDF 3.3 only cluster is provided for reference here: {
"Blueprints": {
"stack_name": "HDF",
"stack_version": "3.3"
},
"host_groups": [
{
"name": "host-group-1",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIFI_CA"
},
{
"name": "STREAMLINE_SERVER"
}
]
},
{
"name": "host-group-4",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
}
]
},
{
"name": "host-group-2",
"components": [
{
"name": "NIFI_MASTER"
},
{
"name": "DRPC_SERVER"
},
{
"name": "METRICS_GRAFANA"
},
{
"name": "KAFKA_BROKER"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "NIMBUS"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "KNOX_GATEWAY"
},
{
"name": "NIFI_REGISTRY_MASTER"
},
{
"name": "REGISTRY_SERVER"
},
{
"name": "STORM_UI_SERVER"
}
]
},
{
"name": "host-group-3",
"components": [
{
"name": "METRICS_MONITOR"
},
{
"name": "SUPERVISOR"
},
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "STREAMLINE_SERVER"
}
]
}
],
"configurations": [
{
"nifi-ambari-config": {
"nifi.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"nifi-registry-ambari-config": {
"nifi.registry.security.encrypt.configuration.password": "StrongPassword"
}
},
{
"ams-hbase-env": {
"hbase_master_heapsize": "512",
"hbase_regionserver_heapsize": "768",
"hbase_master_xmn_size": "192"
}
},
{
"nifi-logsearch-conf": {}
},
{
"storm-site": {
"metrics.reporter.register": "org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter",
"topology.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsSink\", \"parallelism.hint\": 1, \"whitelist\": [\"kafkaOffset\\..+/\", \"__complete-latency\", \"__process-latency\", \"__execute-latency\", \"__receive\\.population$\", \"__sendqueue\\.population$\", \"__execute-count\", \"__emit-count\", \"__ack-count\", \"__fail-count\", \"memory/heap\\.usedBytes$\", \"memory/nonHeap\\.usedBytes$\", \"GC/.+\\.count$\", \"GC/.+\\.timeMs$\"]}]",
"storm.local.dir": "/hadoop/storm",
"storm.cluster.metrics.consumer.register": "[{\"class\": \"org.apache.hadoop.metrics2.sink.storm.StormTimelineMetricsReporter\"}]"
}
},
{
"registry-common": {
"registry.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/registry",
"registry.storage.type": "mysql",
"jar.storage.type": "local",
"registry.storage.connector.password": "StrongPassword"
}
},
{
"registry-env": {}
},
{
"registry-logsearch-conf": {}
},
{
"streamline-common": {
"streamline.storage.type": "mysql",
"streamline.storage.connector.connectURI": "jdbc:mysql://ip-xxx-xx-xx-xx9.us-west-1.compute.internal:3306/streamline",
"streamline.dashboard.url": "http://localhost:9089",
"registry.url": "http://localhost:7788/api/v1",
"jar.storage.type": "local",
"streamline.storage.connector.password": "StrongPassword"
}
},
{
"nifi-registry-properties": {
"nifi.registry.db.password": "StrongPassword"
}
},
{
"ams-hbase-site": {
"hbase.regionserver.global.memstore.upperLimit": "0.35",
"hbase.regionserver.global.memstore.lowerLimit": "0.3",
"hbase.tmp.dir": "/var/lib/ambari-metrics-collector/hbase-tmp",
"hbase.hregion.memstore.flush.size": "134217728",
"hfile.block.cache.size": "0.3",
"hbase.rootdir": "file:///var/lib/ambari-metrics-collector/hbase",
"hbase.cluster.distributed": "false",
"phoenix.coprocessor.maxMetaDataCacheSize": "20480000",
"hbase.zookeeper.property.clientPort": "61181"
}
},
{
"storm-env": {}
},
{
"streamline-env": {}
},
{
"ams-site": {
"timeline.metrics.service.webapp.address": "localhost:6188",
"timeline.metrics.cluster.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.downsampler.event.metric.patterns": "topology\.%",
"timeline.metrics.host.aggregate.splitpoints": "kafka.network.RequestMetrics.ResponseQueueTimeMs.request.OffsetFetch.98percentile",
"timeline.metrics.service.handler.thread.count": "20",
"timeline.metrics.service.watcher.disabled": "false",
"timeline.metrics.host.aggregator.ttl": "86400"
}
},
{
"kafka-broker": {
"log.dirs": "/kafka-logs",
"offsets.topic.replication.factor": "1"
}
},
{
"ams-grafana-env": {
"metrics_grafana_password": "StrongPassword"
}
},
{
"streamline-logsearch-conf": {}
},
{
"zoo.cfg": {
"dataDir": "/hadoop/zookeeper"
}
},
{
"ams-env": {
"metrics_collector_heapsize": "512"
}
}
]
}<br> Sample cluster.json for this 4 node cluster: {
"blueprint": "recommended",
"default_password": "hadoop",
"host_groups": [
{
"hosts": [
{
"fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal"
}
],
"name": "host-group-1"
},
{
"hosts": [
{
"fqdn": "ip-XX-XX-XX-XXX.us-west-1.compute.internal"
}
],
"name": "host-group-3"
},
{
"hosts": [
{
"fqdn": "ip-xxx-xxx-xxx-xxx.us-west-1.compute.internal"
}
],
"name": "host-group-4"
},
{
"hosts": [
{
"fqdn": "ip-xx-xx-xx-xxx.us-west-1.compute.internal"
}
],
"name": "host-group-2"
}
]
}
... View more
05-05-2018
12:03 AM
3 Kudos
Summary: While automating setup of Hortoniabank demo, we needed to automate the task of associating Atlas tags to HDP entities like HDFS, Hive, HBase, Kafka using the names of entities (rather than their guids in Atlas). One option is to use Atlas APIs to find the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. For components like Hive that already have Atlas hook, the Atlas entities for Hive tables will automatically be created when the table is created. For these, we have just provided the API calls to associate the tags with the entity. For others like Kafka, HDFS, Hbase etc that do not have an Atlas hook (as of HDP 2.6.x), you will need to create the entity first. For these, we have provided both the API call to create the entity and the call to associate the tags with the entity. Code samples: The below code examples assume the tags have already been created. these can be created either manually via Atlas UI or using the API. Here is a sample Atlas API call to create a basic tag called TEST that does not have any attributes. ${atlas_curl} ${atlas_url}/types \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"TEST","typeDescription":"TEST","typeVersion":"1.0","attributeDefinitions":[]}],"classTypes":[]}'
All the examples operate the same way: find the guid of the entity you are looking for using qualifiedName attribute and then use the guid to associates tag to it. First we setup common vars: atlas_host="atlas.domain.com"
cluster_name="datalake"
atlas_curl="curl -u admin:admin"
atlas_url="http://${atlas_host}:21000/api/atlas"
Example 1: Associate tag REFERENCE_DATA (w/o attributes) to Hive table hortoniabank.eu_countries #fetch guid for table hortoniabank.eu_countries@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=hortoniabank.eu_countries@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add REFERENCE_DATA tag
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"REFERENCE_DATA","values":{}}' Example 2: Associate tag DATA_QUALITY (with attribute: score and value: 0.51) to Hive table cost_savings.claim_savings #fetch guid for table cost_savings.claim_savings@${cluster_name}
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_table?attr:qualifiedName=cost_savings.claim_savings@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add DATA_QUALITY tag with score=0.51
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"DATA_QUALITY", "values":{"score": "0.51"}}'
Example 3: Associate tag FINANCE_PII (with attribute: type and value:finance) to Hive column finance.tax_2015.ssn #fetch guid for finance.tax_2015.ssn
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hive_column?attr:qualifiedName=finance.tax_2015.ssn@${cluster_name} | jq '.entity.guid' | tr -d '"')
#add FINANCE_PII tag with type=finance
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"FINANCE_PII", "values":{"type": "finance"}}' Example 4: Create entity for kafka topic PRIVATE and associate with tag SENSITIVE #create entities for kafka topics PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"kafka_topic", "attributes":{ "description":null, "name":"PRIVATE", "owner":null, "qualifiedName":"PRIVATE@${cluster_name}", "topic":"PRIVATE", "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/kafka_topic?attr:qualifiedName=PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 5: create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag #create entities for Hbase table T_PRIVATE and associate with SENSITIVE tag
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hbase_table", "attributes":{ "description":"T_PRIVATE table", "name":"T_PRIVATE", "owner":"hbase", "qualifiedName":"T_PRIVATE@${cluster_name}", "column_families":[ ], "uri":"none" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hbase_table?attr:qualifiedName=T_PRIVATE@${cluster_name} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"SENSITIVE","values":{}}' Example 6: create entities for HDFS path /banking and associate with BANKING tag #create entities for HDFS path /banking and associate with BANKING tag
hdfs_prefix="hdfs://$(hostname -f):8020"
hdfs_path="/banking"
${atlas_curl} ${atlas_url}/v2/entity -X POST -H 'Content-Type: application/json' -d @- <<EOF
{ "entity":{ "typeName":"hdfs_path", "attributes":{ "description":null, "name":"${hdfs_path}", "owner":null, "qualifiedName":"${hdfs_prefix}${hdfs_path}", "clusterName":"${cluster_name}", "path":"${hdfs_prefix}${hdfs_path}" }, "guid":-1 }, "referredEntities":{ }}
EOF
guid=$(${atlas_curl} ${atlas_url}/v2/entity/uniqueAttribute/type/hdfs_path?attr:qualifiedName=${hdfs_prefix}${hdfs_path} | jq '.entity.guid' | tr -d '"')
${atlas_curl} ${atlas_url}/entities/${guid}/traits \
-X POST -H 'Content-Type: application/json' \
--data-binary '{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"BANKING","values":{}}'
... View more
Labels: