Member since
04-03-2019
962
Posts
1743
Kudos Received
146
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 17738 | 03-08-2019 06:33 PM | |
| 7166 | 02-15-2019 08:47 PM |
10-13-2016
08:02 AM
7 Kudos
In previous post we have seen how to install multi node HDP cluster using Ambari Blueprints. In this post we will see how to Automate HDP installation with Namenode HA using Ambari Blueprints. . Note - For Ambari 2.6.X onwards, we will have to register VDF to register internal repository, or else Ambari will pick up latest version of HDP and use the public repos. please see below document for more information. For Ambari version less than 2.6.X, this guide will work without any modifications. Document - https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-release-notes/content/ambari_relnotes-2.6.0.0-behavioral-changes.html . Below are simple steps to install HDP multinode cluster with Namenode HA using internal repository via Ambari Blueprints. . Step 1: Install Ambari server using steps mentioned under below link http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html . Step 2: Register ambari-agent manually Install ambari-agent package on all the nodes in the cluster and modify hostname to ambari server host(fqdn) in /etc/ambari-agent/conf/ambari-agent.ini . Step 3: Configure blueprints Please follow below steps to create Blueprints . 3.1 Create hostmapping.json file as shown below: Note – This file will have information related to all the hosts which are part of your HDP cluster. {
"blueprint" : "prod",
"default_password" : "hadoop",
"host_groups" :[
{
"name" : "prodnode1",
"hosts" : [
{
"fqdn" : "prodnode1.openstacklocal"
}
]
},
{
"name" : "prodnode2",
"hosts" : [
{
"fqdn" : "prodnode2.openstacklocal"
}
]
},
{
"name" : "prodnode3",
"hosts" : [
{
"fqdn" : "prodnode3.openstacklocal"
}
]
}
]
} . 3.2 Create cluster_configuration.json file, it contents mapping of hosts to HDP components {
"configurations" : [
{ "core-site": {
"properties" : {
"fs.defaultFS" : "hdfs://prod",
"ha.zookeeper.quorum" : "%HOSTGROUP::prodnode1%:2181,%HOSTGROUP::prodnode2%:2181,%HOSTGROUP::prodnode3%:2181"
}}
},
{ "hdfs-site": {
"properties" : {
"dfs.client.failover.proxy.provider.prod" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.automatic-failover.enabled" : "true",
"dfs.ha.fencing.methods" : "shell(/bin/true)",
"dfs.ha.namenodes.prod" : "nn1,nn2",
"dfs.namenode.http-address" : "%HOSTGROUP::prodnode1%:50070",
"dfs.namenode.http-address.prod.nn1" : "%HOSTGROUP::prodnode1%:50070",
"dfs.namenode.http-address.prod.nn2" : "%HOSTGROUP::prodnode3%:50070",
"dfs.namenode.https-address" : "%HOSTGROUP::prodnode1%:50470",
"dfs.namenode.https-address.prod.nn1" : "%HOSTGROUP::prodnode1%:50470",
"dfs.namenode.https-address.prod.nn2" : "%HOSTGROUP::prodnode3%:50470",
"dfs.namenode.rpc-address.prod.nn1" : "%HOSTGROUP::prodnode1%:8020",
"dfs.namenode.rpc-address.prod.nn2" : "%HOSTGROUP::prodnode3%:8020",
"dfs.namenode.shared.edits.dir" : "qjournal://%HOSTGROUP::prodnode1%:8485;%HOSTGROUP::prodnode2%:8485;%HOSTGROUP::prodnode3%:8485/prod",
"dfs.nameservices" : "prod"
}}
}],
"host_groups" : [
{
"name" : "prodnode1",
"components" : [
{
"name" : "NAMENODE"
},
{
"name" : "JOURNALNODE"
},
{
"name" : "ZKFC"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
"name" : "FALCON_CLIENT"
},
{
"name" : "OOZIE_CLIENT"
},
{
"name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
}
],
"cardinality" : 1
},
{
"name" : "prodnode2",
"components" : [
{
"name" : "JOURNALNODE"
},
{
"name" : "MYSQL_SERVER"
},
{
"name" : "HIVE_SERVER"
},
{
"name" : "HIVE_METASTORE"
},
{
"name" : "WEBHCAT_SERVER"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
"name" : "FALCON_SERVER"
},
{
"name" : "OOZIE_SERVER"
},
{
"name" : "FALCON_CLIENT"
},
{
"name" : "OOZIE_CLIENT"
},
{
"name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
}
],
"cardinality" : 1
},
{
"name" : "prodnode3",
"components" : [
{
"name" : "RESOURCEMANAGER"
},
{
"name" : "JOURNALNODE"
},
{
"name" : "ZKFC"
},
{
"name" : "NAMENODE"
},
{
"name" : "APP_TIMELINE_SERVER"
},
{
"name" : "HISTORYSERVER"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
"name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
}
],
"cardinality" : 1
}
],
"Blueprints" : {
"blueprint_name" : "prod",
"stack_name" : "HDP",
"stack_version" : "2.4"
}
} Note - I have kept Namenodes on prodnode1 and prodnode3, you can change it according to your requirement. I have added few more services like Hive, Falcon, Oozie etc. You can remove them or add few more according to your requirement. . Step 4: Create an internal repository map . 4.1: hdp repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in repo.json file. {
"Repositories":{
"base_url":"http://<ip-address-of-repo-server>/hdp/centos6/HDP-2.4.2.0",
"verify_base_url":true
}
} . 4.2: hdp-utils repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in hdputils-repo.json file. {
"Repositories" : {
"base_url" : "http://<ip-address-of-repo-server>/hdp/centos6/HDP-UTILS-1.1.0.20",
"verify_base_url" : true
}
} Step 5: Register blueprint with ambari server by executing below command curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/blueprints/multinode-hdp -d @cluster_config.json . Step 6: Setup Internal repo via REST API. Execute below curl calls to setup internal repositories. curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-2.4 -d @repo.json
curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-UTILS-1.1.0.20 -d @hdputils-repo.json . Step 7: Pull the trigger! Below command will start cluster installation. curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/clusters/multinode-hdp -d @hostmap.json . Please refer Part-4 for setting up HDP with Kerberos authentication via Ambari blueprint. . Please feel free to comment if you need any further help on this. Happy Hadooping!!
... View more
Labels:
08-22-2016
11:59 PM
3 Kudos
Below are the steps to run Hive(TEZ) query in a shell script using Oozie shell action . 1. Configure job.properties Example: #*************************************************
# job.properties
#*************************************************
nameNode=hdfs://<namenode-fqdn>:8020
jobTracker=<resourcemanager-host-fqdn>:8050
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/ . 2. Configure Workflow.xml Example: <?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4"
name="test-shell-with-kerberos-wf">
<global>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</global>
<credentials>
<credential name="hive_credentials" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>thrift://<metastore-server>:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST@REALM</value>
</property>
</credential>
</credentials>
<start to="run-shell-script"/>
<action name="run-shell-script" cred="hive_credentials">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>/hdp/apps/<VERSION>/tez/tez.tar.gz</value>
</property>
</configuration>
<exec>/user/<username>/hive.sh</exec>
<file>/user/<username>/hive.sh#hive.sh</file>
</shell>
<ok to="end"/>
<error to="killnode"/>
</action>
<kill name="killnode">
<message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app> . 3. Write sample shell script Example: #!/bin/bash
hive --hiveconf mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_LOCATION --hiveconf tez.credentials.path=$HADOOP_TOKEN_FILE_LOCATION -e 'select * from test_hive;'
4. Upload workflow.xml and shell script to "oozie.wf.application.path" defined in job.properties . 5. Follow below command to run Oozie workflow oozie job -oozie http://<oozie-server-hostname>:11000/oozie -config /$PATH/job.properties -run . Please comment if you have any question! Happy Hadooping!! 🙂
... View more
Labels:
08-12-2016
05:17 AM
@Robert Levas - DEFAULT at the middle worked when I tried this setup. I checked given article and I agree that modifying dfs.namenode.kerberos.principal.pattern was somehow missed while writing this article. I will add that missing step now. Thank you! 🙂
... View more
08-11-2016
07:16 AM
7 Kudos
How to setup cross realm trust between two MIT KDC – In this post, we will see how to setup cross realm trust between two MIT KDC. We can access and copy data from one cluster to another if the cross realm trust is setup correctly. . In our example, we have 2 clusters with same HDP version(2.4.2.0) and Ambari version(2.2.2.0) . Cluster 1: 172.26.68.47 hwx-1.hwx.com hwx-1
172.26.68.46 hwx-2.hwx.com hwx-2
172.26.68.45 hwx-3.hwx.com hwx-3
Cluster 2: 172.26.68.48 support-1.support.com support-1
172.26.68.49 support-2.support.com support-2
172.26.68.50 support-3.support.com support-3 . Below are the steps: . Step 1: Make sure both the clusters are kerberized with MIT KDC. You can use below automated script for configuring Kerberos on HDP. https://community.hortonworks.com/articles/29203/automated-kerberos-installation-and-configuration.html . Step 2: Please configure /etc/hosts file on both the clusters to have Ip <-> hostname mappings. Example: On both clusters /etc/hosts file should look like below: 172.26.68.47 hwx-1.hwx.com hwx-1
172.26.68.46 hwx-2.hwx.com hwx-2
172.26.68.45 hwx-3.hwx.com hwx-3
172.26.68.48 support-1.support.com support-1
172.26.68.49 support-2.support.com support-2
172.26.68.50 support-3.support.com support-3 . Step 3: Configure krb5.conf: . 3.1 Configure [realm] section to add another cluster’s KDC server details – This is required to find KDC to authenticate user which belongs to another cluster. Example on Cluster1: [realms]
HWX.COM = {
admin_server = hwx-1.hwx.com
kdc = hwx-1.hwx.com
}
SUPPORT.COM = {
admin_server = support-1.support.com
kdc = support-1.support.com
}
. 3.2 Configure [domain_realm] section to add another cluster’s domain <-> realm mapping. [domain_realm]
.hwx.com = HWX.COM
hwx.com = HWX.COM
.support.com = SUPPORT.COM
support.com = SUPPORT.COM . 3.3 Configure [capaths] to add another cluster’s realm [capaths]
HWX.COM = {
SUPPORT.COM = .
} On Cluster 1, the krb5.conf should look like below: [libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = HWX.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[logging]
default = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
kdc = FILE:/var/log/krb5kdc.log
[realms]
HWX.COM = {
admin_server = hwx-1.hwx.com
kdc = hwx-1.hwx.com
}
SUPPORT.COM = {
admin_server = support-1.support.com
kdc = support-1.support.com
}
[domain_realm]
.hwx.com = HWX.COM
hwx.com = HWX.COM
.support.com = SUPPORT.COM
support.com = SUPPORT.COM
[capaths]
HWX.COM = {
SUPPORT.COM = .
} Note – Please copy modified /etc/krb5.conf to all the nodes in Cluster 1 . Similarly on Cluster2, the krb5.conf should look like below: [libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = SUPPORT.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[logging]
default = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
kdc = FILE:/var/log/krb5kdc.log
[realms]
SUPPORT.COM = {
admin_server = support-1.support.com
kdc = support-1.support.com
}
HWX.COM = {
admin_server = hwx-1.hwx.com
kdc = hwx-1.hwx.com
}
[domain_realm]
.hwx.com = HWX.COM
hwx.com = HWX.COM
.support.com = SUPPORT.COM
support.com = SUPPORT.COM
[capaths]
SUPPORT.COM = {
HWX.COM = .
} Note – Please copy modified /etc/krb5.conf to all the nodes in Cluster 2 . Step 4: Modify below property in hdfs-site.xml on a cluster from where you want to execute distcp command ( specifically speaking - client side ) dfs.namenode.kerberos.principal.pattern=* . Step 5: Add a common trust principal in both the KDCs. Please keep same password for both the principals . On Cluster 1 and 2, execute below commands in kadmin utility: addprinc krbtgt/HWX.COM@SUPPORT.COM
addprinc krbtgt/SUPPORT.COM@HWX.COM . Step 6: Configure auth_to_local rules on both the clusters: . On Cluster1, append auth_to_local rules from Cluster2 Example on Cluster 1: RULE:[1:$1@$0](ambari-qa-hadoop@HWX.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-hadoop@HWX.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-hadoop@HWX.COM)s/.*/spark/
RULE:[1:$1@$0](.*@HWX.COM)s/@.*//
RULE:[2:$1@$0](dn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@HWX.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@HWX.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@HWX.COM)s/.*/yarn/
DEFAULT
RULE:[1:$1@$0](ambari-qa-support@SUPPORT.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-support@SUPPORT.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-support@SUPPORT.COM)s/.*/spark/
RULE:[1:$1@$0](.*@SUPPORT.COM)s/@.*//
RULE:[2:$1@$0](dn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@SUPPORT.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@SUPPORT.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@SUPPORT.COM)s/.*/yarn/
. On Cluster2, append auth_to_local rules from Cluster1 Example on Cluster 2: RULE:[1:$1@$0](ambari-qa-support@SUPPORT.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-support@SUPPORT.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-support@SUPPORT.COM)s/.*/spark/
RULE:[1:$1@$0](.*@SUPPORT.COM)s/@.*//
RULE:[2:$1@$0](dn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@SUPPORT.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@SUPPORT.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@SUPPORT.COM)s/.*/yarn/
DEFAULT
RULE:[1:$1@$0](ambari-qa-hadoop@HWX.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-hadoop@HWX.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-hadoop@HWX.COM)s/.*/spark/
RULE:[1:$1@$0](.*@HWX.COM)s/@.*//
RULE:[2:$1@$0](dn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@HWX.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@HWX.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@HWX.COM)s/.*/yarn/ . Step 7: Login to Cluster 2, do a kinit by local user and try to access hdfs files of Cluster 1 Example: hdfs dfs -ls hdfs://hwx-2.hwx.com:8020/tmp
Found 8 items
drwx------ - ambari-qa hdfs 0 2016-07-29 23:24 hdfs://hwx-2.hwx.com:8020/tmp/ambari-qa
drwxr-xr-x - hdfs hdfs 0 2016-07-29 22:02 hdfs://hwx-2.hwx.com:8020/tmp/entity-file-history
drwx-wx-wx - ambari-qa hdfs 0 2016-07-29 23:25 hdfs://hwx-2.hwx.com:8020/tmp/hive
-rwxr-xr-x 3 hdfs hdfs 1414 2016-07-29 23:50 hdfs://hwx-2.hwx.com:8020/tmp/id1aac2d44_date502916
-rwxr-xr-x 3 ambari-qa hdfs 1414 2016-07-29 23:26 hdfs://hwx-2.hwx.com:8020/tmp/idtest.ambari-qa.1469834803.19.in
-rwxr-xr-x 3 ambari-qa hdfs 957 2016-07-29 23:26 hdfs://hwx-2.hwx.com:8020/tmp/idtest.ambari-qa.1469834803.19.pig
drwxr-xr-x - ambari-qa hdfs 0 2016-07-29 23:53 hdfs://hwx-2.hwx.com:8020/tmp/tezsmokeinput Note – hwx-2.hwx.com is the Active Namenode of Cluster 1. . You can try copying files from Cluster 2 to Cluster 1 using distcp Example: [kuldeepk@support-1 root]$ hadoop distcp hdfs://hwx-1.hwx.com:8020/tmp/test.txt /tmp/
16/07/30 22:03:27 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hwx-1.hwx.com:8020/tmp/test.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false}
16/07/30 22:03:27 INFO impl.TimelineClientImpl: Timeline service address: http://support-3.support.com:8188/ws/v1/timeline/
16/07/30 22:03:27 INFO client.RMProxy: Connecting to ResourceManager at support-3.support.com/172.26.68.50:8050
16/07/30 22:03:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 20 for kuldeepk on 172.26.68.47:8020
16/07/30 22:03:28 INFO security.TokenCache: Got dt for hdfs://hwx-1.hwx.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.26.68.47:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for kuldeepk)
16/07/30 22:03:29 INFO impl.TimelineClientImpl: Timeline service address: http://support-3.support.com:8188/ws/v1/timeline/
16/07/30 22:03:29 INFO client.RMProxy: Connecting to ResourceManager at support-3.support.com/172.26.68.50:8050
16/07/30 22:03:29 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for kuldeepk on ha-hdfs:support
16/07/30 22:03:29 INFO security.TokenCache: Got dt for hdfs://support; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:support, Ident: (HDFS_DELEGATION_TOKEN token 24 for kuldeepk)
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: number of splits:1
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469916118318_0003
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.26.68.47:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for kuldeepk)
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:support, Ident: (HDFS_DELEGATION_TOKEN token 24 for kuldeepk)
16/07/30 22:03:30 INFO impl.YarnClientImpl: Submitted application application_1469916118318_0003
16/07/30 22:03:31 INFO mapreduce.Job: The url to track the job: http://support-3.support.com:8088/proxy/application_1469916118318_0003/
16/07/30 22:03:31 INFO tools.DistCp: DistCp job-id: job_1469916118318_0003
16/07/30 22:03:31 INFO mapreduce.Job: Running job: job_1469916118318_0003
16/07/30 22:03:43 INFO mapreduce.Job: Job job_1469916118318_0003 running in uber mode : false
16/07/30 22:03:43 INFO mapreduce.Job: map 0% reduce 0%
16/07/30 22:03:52 INFO mapreduce.Job: map 100% reduce 0%
16/07/30 22:03:53 INFO mapreduce.Job: Job job_1469916118318_0003 completed successfully
16/07/30 22:03:53 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=142927
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=346
HDFS: Number of bytes written=45
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=14324
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=7162
Total vcore-seconds taken by all map tasks=7162
Total megabyte-seconds taken by all map tasks=7333888
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=118
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=77
CPU time spent (ms)=1210
Physical memory (bytes) snapshot=169885696
Virtual memory (bytes) snapshot=2337554432
Total committed heap usage (bytes)=66584576
File Input Format Counters
Bytes Read=228
File Output Format Counters
Bytes Written=45
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESSKIPPED=0
SKIP=1 Note – hwx-1.hwx.com is the Active Namenode of Cluster 1. . Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! 🙂 References: http://crazyadmins.com https://community.hortonworks.com/articles/18686/kerberos-cross-realm-trust-for-distcp.html
... View more
Labels:
07-31-2016
03:44 AM
@Saurabh Kumar - Nice Article! P.S - I have removed username and replaced it with $user in the logs.
... View more
07-29-2016
05:37 PM
@Artem Ervits There is a typo at below line #! /usr/bin/env pythonimport os, pwd, sys It should be like: #! /usr/bin/env python
import os, pwd, sys
... View more
07-29-2016
05:26 AM
4 Kudos
Below are the steps to run hive query in a shell script using Oozie shell action . 1. Configure job.properties Example: #*************************************************
# job.properties
#*************************************************
nameNode=hdfs://<namenode-fqdn>:8020
jobTracker=<resourcemanager-host-fqdn>:8050
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/ . 2. Configure Workflow.xml Example: <workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<credentials>
<credential name='my_auth' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>thrift://<hive-metastore-hostname>:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST@HWX.COM</value>
</property>
</credential>
</credentials>
<start to="shell-node"/>
<action name="shell-node" cred="my_auth">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>test.sh</exec>
<file>/user/<username>/test.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
. 3. Write sample shell script Example:
#!/bin/bashhive -e "SET mapreduce.job.credentials.binary=$HADOOP_TOKEN_FILE_LOCATION; select count(*) from test_hive;" . 4. Upload workflow.xml and shell script to "oozie.wf.application.path" defined in job.properties . 5. Follow below command to run Oozie workflow
oozie job -oozie http://<oozie-server-hostname>:11000/oozie -config /$PATH/job.properties -run Please note - This has been successfully tested with hive.execution.engine=mr; . Please comment if you have any question! Happy Hadooping!! 🙂
... View more
Labels:
07-26-2016
06:54 PM
@Gerd Koenig Thanks. Sure In next post, We can cover NN/RM HA via Blueprints.
... View more
07-26-2016
04:16 AM
11 Kudos
In previous post we have seen how to install single node HDP cluster using Ambari Blueprints. In this post we will see how to Automate HDP installation using Ambari Blueprints. . Note - For Ambari 2.6.X onwards, we will have to register VDF to register internal repository, or else Ambari will pick up latest version of HDP and use the public repos. please see below document for more information. For Ambari version less than 2.6.X, this guide will work without any modifications. Document - https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-release-notes/content/ambari_relnotes-2.6.0.0-behavioral-changes.html . Below are simple steps to install HDP multinode cluster using internal repository via Ambari Blueprints. . Step 1: Install Ambari server using steps mentioned under below link http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html . Step 2: Register ambari-agent manually Install ambari-agent package on all the nodes in the cluster and modify hostname to ambari server host(fqdn) in /etc/ambari-agent/conf/ambari-agent.ini . Step 3: Configure blueprints Please follow below steps to create Blueprints . 3.1 Create hostmapping.json file as shown below: Note – This file will have information related to all the hosts which are part of your HDP cluster. {
"blueprint" : "multinode-hdp",
"default_password" : "hadoop",
"host_groups" :[
{
"name" : "host2",
"hosts" : [
{
"fqdn" : "host2.crazyadmins.com"
}
]
},
{
"name" : "host3",
"hosts" : [
{
"fqdn" : "host3.crazyadmins.com"
}
]
},
{
"name" : "host4",
"hosts" : [
{
"fqdn" : "host4.crazyadmins.com"
}
]
}
]
} . 3.2 Create cluster_configuration.json file, it contents mapping of hosts to HDP components {
"configurations": [],
"host_groups": [{
"name": "host2",
"components": [{
"name": "PIG"
}, {
"name": "METRICS_COLLECTOR"
}, {
"name": "KAFKA_BROKER"
}, {
"name": "HISTORYSERVER"
}, {
"name": "HBASE_REGIONSERVER"
}, {
"name": "OOZIE_CLIENT"
}, {
"name": "HBASE_CLIENT"
}, {
"name": "NAMENODE"
}, {
"name": "SUPERVISOR"
}, {
"name": "HCAT"
}, {
"name": "METRICS_MONITOR"
}, {
"name": "APP_TIMELINE_SERVER"
}, {
"name": "NODEMANAGER"
}, {
"name": "HDFS_CLIENT"
}, {
"name": "HIVE_CLIENT"
}, {
"name": "FLUME_HANDLER"
}, {
"name": "DATANODE"
}, {
"name": "WEBHCAT_SERVER"
}, {
"name": "ZOOKEEPER_CLIENT"
}, {
"name": "ZOOKEEPER_SERVER"
}, {
"name": "STORM_UI_SERVER"
}, {
"name": "HIVE_SERVER"
}, {
"name": "FALCON_CLIENT"
}, {
"name": "TEZ_CLIENT"
}, {
"name": "HIVE_METASTORE"
}, {
"name": "SQOOP"
}, {
"name": "YARN_CLIENT"
}, {
"name": "MAPREDUCE2_CLIENT"
}, {
"name": "NIMBUS"
}, {
"name": "DRPC_SERVER"
}],
"cardinality": "1"
}, {
"name": "host3",
"components": [{
"name": "ZOOKEEPER_SERVER"
}, {
"name": "OOZIE_SERVER"
}, {
"name": "SECONDARY_NAMENODE"
}, {
"name": "FALCON_SERVER"
}, {
"name": "ZOOKEEPER_CLIENT"
}, {
"name": "PIG"
}, {
"name": "KAFKA_BROKER"
}, {
"name": "OOZIE_CLIENT"
}, {
"name": "HBASE_REGIONSERVER"
}, {
"name": "HBASE_CLIENT"
}, {
"name": "HCAT"
}, {
"name": "METRICS_MONITOR"
}, {
"name": "FALCON_CLIENT"
}, {
"name": "TEZ_CLIENT"
}, {
"name": "SQOOP"
}, {
"name": "HIVE_CLIENT"
}, {
"name": "HDFS_CLIENT"
}, {
"name": "NODEMANAGER"
}, {
"name": "YARN_CLIENT"
}, {
"name": "MAPREDUCE2_CLIENT"
}, {
"name": "DATANODE"
}],
"cardinality": "1"
}, {
"name": "host4",
"components": [{
"name": "ZOOKEEPER_SERVER"
}, {
"name": "ZOOKEEPER_CLIENT"
}, {
"name": "PIG"
}, {
"name": "KAFKA_BROKER"
}, {
"name": "OOZIE_CLIENT"
}, {
"name": "HBASE_MASTER"
}, {
"name": "HBASE_REGIONSERVER"
}, {
"name": "HBASE_CLIENT"
}, {
"name": "HCAT"
}, {
"name": "RESOURCEMANAGER"
}, {
"name": "METRICS_MONITOR"
}, {
"name": "FALCON_CLIENT"
}, {
"name": "TEZ_CLIENT"
}, {
"name": "SQOOP"
}, {
"name": "HIVE_CLIENT"
}, {
"name": "HDFS_CLIENT"
}, {
"name": "NODEMANAGER"
}, {
"name": "YARN_CLIENT"
}, {
"name": "MAPREDUCE2_CLIENT"
}, {
"name": "DATANODE"
}],
"cardinality": "1"
}],
"Blueprints": {
"blueprint_name": "multinode-hdp",
"stack_name": "HDP",
"stack_version": "2.3"
}
} . Step 4: Create an internal repository map . 4.1: hdp repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in repo.json file. {
"Repositories" : {
"base_url" : "http://<ip-address-of-repo-server>/hdp/centos6/HDP-2.3.4.0",
"verify_base_url" : true
}
} . 4.2: hdp-utils repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in hdputils-repo.json file. {
"Repositories" : {
"base_url" : "http://<ip-address-of-repo-server>/hdp/centos6/HDP-UTILS-1.1.0.20",
"verify_base_url" : true
}
} . Step 5: Register blueprint with ambari server by executing below command curl -H "X-Requested-By: ambari" -X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/blueprints/multinode-hdp -d @cluster_config.json . Step 6: Setup Internal repo via REST API. Execute below curl calls to setup internal repositories. curl -H "X-Requested-By: ambari" -X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.3/operating_systems/redhat6/repositories/HDP-2.3 -d @repo.json
curl -H "X-Requested-By: ambari" -X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.3/operating_systems/redhat6/repositories/HDP-UTILS-1.1.0.20 -d @hdputils-repo.json . Step 7: Pull the trigger! Below command will start cluster installation. curl -H "X-Requested-By: ambari" -X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/clusters/multinode-hdp -d @hostmap.json . Note - Please refer third part of this tutorial if you want to setup a multinode cluster with Namenode HA . Please feel free to comment if you need any further help on this. Happy Hadooping!!
... View more
Labels:
07-26-2016
04:09 AM
16 Kudos
What are Ambari Blueprints ? Ambari Blueprints are definition of your HDP cluster in “JSON” format, it contents information about all the hosts in your cluster, their components, mapping of stack components with each hosts or hostgroups and other cool stuff. Using Blueprints we can call Ambari APIs to completely automate HDP installation process. Interesting stuff, isn’t it ? Lets get started with single node cluster installation. Below are the steps to setup single-node HDP cluster with Ambari Blueprints. . Note - For Ambari 2.6.X onwards, we will have to register VDF to register internal repository, or else Ambari will pick up latest version of HDP and use the public repos. please see below document for more information. For Ambari version less than 2.6.X, this guide will work without any modifications. Document - https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-release-notes/content/ambari_relnotes-2.6.0.0-behavioral-changes.html . Step 1: Install Ambari server using steps mentioned under below link http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html . Step 2: Register ambari-agent manually Install ambari-agent package on all the nodes in the cluster and modify hostname to ambari server host(fqdn) in /etc/ambari-agent/conf/ambari-agent.ini . Step 3: Configure blueprints Please follow below steps to create Blueprints 3.1 Create hostmapping.json file as shown below: {
"blueprint" : "single-node-hdp-cluster",
"default_password" : "admin",
"host_groups" :[
{
"name" : "host_group_1",
"hosts" : [
{
"fqdn" : "<fqdn-of-single-node-cluster-machine>"
}
]
}
]
} . 3.2 Create cluster_configuration.json file, it contents mapping of hosts to HDP components {
"configurations" : [ ],
"host_groups" : [
{
"name" : "host_group_1",
"components" : [
{
"name" : "NAMENODE"
},
{
"name" : "SECONDARY_NAMENODE"
},
{
"name" : "DATANODE"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "RESOURCEMANAGER"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "YARN_CLIENT"
},
{
"name" : "HISTORYSERVER"
},
{
"name" : "APP_TIMELINE_SERVER"
},
{
"name" : "MAPREDUCE2_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
},
{
"name" : "ZOOKEEPER_CLIENT"
}
],
"cardinality" : "1"
}
],
"Blueprints" : {
"blueprint_name" : "single-node-hdp-cluster",
"stack_name" : "HDP",
"stack_version" : "2.3"
}
} . Step 4: Register blueprint with ambari server by executing below command curl -H "X-Requested-By: ambari" -X POST -u admin:admin http://<ambari-hostname>:8080/api/v1/blueprints/<blueprint-name>; -d @cluster_configuration.json . Srep 5: Pull the trigger! Below command will start cluster installation. curl -H "X-Requested-By: ambari" -X POST -u admin:admin http://<ambari-host>:8080/api/v1/clusters/<new-cluster-name>; -d @hostmapping.json . Step 6: We can track installation status by below REST call or we can check the same from ambari UI curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://<ambari-hostname>:8080/api/v1/clusters/mycluster/requests/ curl -H "X-Requested-By: ambari" -X GET -u admin:admin http://<ambari-hostname>:8080/api/v1/clusters/mycluster/requests/<request-number>; . Thank you for your time! Please read next part to see installation of HDP multinode cluster using Ambari Blueprints. . Happy Hadooping!! 🙂
... View more
Labels: