Created 02-15-2016 03:48 AM
2016-02-14 22:40:05,909 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-02-14 22:40:06,618 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-02-14 22:40:06,760 - HdfsResource['/user/ambari-qa/mapredsmokeoutput'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['delete_on_execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory'} 2016-02-14 22:40:06,860 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-HDPCA@EXAMPLE.COM'] {'user': 'hdfs'} 2016-02-14 22:40:11,788 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpZmSrzL 2>/tmp/tmpxqBP9F''] {'quiet': False} 2016-02-14 22:40:15,601 - call returned (0, '') 2016-02-14 22:40:15,603 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpWcmtSE 2>/tmp/tmp8ueZHF''] {'quiet': False} 2016-02-14 22:40:19,015 - call returned (0, '') 2016-02-14 22:40:19,017 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = [] 2016-02-14 22:40:19,018 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpEDIPa1 2>/tmp/tmp1Xt3Yx''] {'quiet': False} 2016-02-14 22:40:22,856 - call returned (0, '') 2016-02-14 22:40:22,858 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpKQmmZw 2>/tmp/tmpJ6YFEL''] {'quiet': False} 2016-02-14 22:40:26,162 - call returned (0, '') 2016-02-14 22:40:26,164 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = [] 2016-02-14 22:40:26,167 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeoutput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpC3nS44 2>/tmp/tmpYhq8nH''] {'logoutput': None, 'quiet': False} 2016-02-14 22:40:29,885 - call returned (0, '') 2016-02-14 22:40:30,159 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X DELETE --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeoutput?op=DELETE&user.name=hdfs&recursive=True'"'"' 1>/tmp/tmpKxdOxa 2>/tmp/tmpNflsmo''] {'logoutput': None, 'quiet': False} 2016-02-14 22:40:34,695 - call returned (0, '') 2016-02-14 22:40:34,697 - HdfsResource['/user/ambari-qa/mapredsmokeinput'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'source': '/etc/passwd', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['create_on_execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'file'} 2016-02-14 22:40:34,699 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-HDPCA@EXAMPLE.COM'] {'user': 'hdfs'} 2016-02-14 22:40:36,030 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpWi075d 2>/tmp/tmpfGORFu''] {'quiet': False} 2016-02-14 22:40:39,946 - call returned (0, '') 2016-02-14 22:40:39,948 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpVeHNPf 2>/tmp/tmpXbZTBA''] {'quiet': False} 2016-02-14 22:40:41,300 - call returned (0, '') 2016-02-14 22:40:41,302 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = [] 2016-02-14 22:40:41,303 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp_2srqJ 2>/tmp/tmpaBMZT6''] {'quiet': False} 2016-02-14 22:40:41,588 - call returned (0, '') 2016-02-14 22:40:41,591 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpzAgUPP 2>/tmp/tmp_JVFtg''] {'quiet': False} 2016-02-14 22:40:41,859 - call returned (0, '') 2016-02-14 22:40:41,861 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = [] 2016-02-14 22:40:41,867 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeinput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmp1NN7xj 2>/tmp/tmpdtdIhc''] {'logoutput': None, 'quiet': False} 2016-02-14 22:40:42,493 - call returned (0, '') 2016-02-14 22:40:42,494 - DFS file /user/ambari-qa/mapredsmokeinput is identical to /etc/passwd, skipping the copying 2016-02-14 22:40:42,495 - HdfsResource[None] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'} 2016-02-14 22:40:42,495 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-HDPCA@EXAMPLE.COM;'] {'user': 'ambari-qa'} 2016-02-14 22:40:42,778 - ExecuteHadoop['jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount /user/ambari-qa/mapredsmokeinput /user/ambari-qa/mapredsmokeoutput'] {'bin_dir': '/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-client/bin', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'logoutput': True, 'try_sleep': 5, 'tries': 1, 'user': 'ambari-qa'} 2016-02-14 22:40:42,881 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount /user/ambari-qa/mapredsmokeinput /user/ambari-qa/mapredsmokeoutput'] {'logoutput': True, 'try_sleep': 5, 'environment': {}, 'tries': 1, 'user': 'ambari-qa', 'path': ['/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-client/bin']} WARNING: Use "yarn jar" to launch YARN applications. 16/02/14 22:42:50 INFO impl.TimelineClientImpl: Timeline service address: http://lnx1.localdomain.com:8188/ws/v1/timeline/ 16/02/14 22:42:51 INFO client.RMProxy: Connecting to ResourceManager at Lnx1.localdomain.com/192.168.122.40:8050 16/02/14 22:42:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 599 for ambari-qa on ha-hdfs:HDPCA 16/02/14 22:42:54 INFO security.TokenCache: Got dt for hdfs://HDPCA; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:HDPCA, Ident: (HDFS_DELEGATION_TOKEN token 599 for ambari-qa) 16/02/14 22:42:54 WARN token.Token: Cannot find class for token kind kms-dt 16/02/14 22:42:54 INFO security.TokenCache: Got dt for hdfs://HDPCA; Kind: kms-dt, Service: 192.168.0.102:9292, Ident: 00 0f 61 6d 62 61 72 69 2d 71 61 2d 48 44 50 43 41 02 72 6d 00 8a 01 52 e3 06 19 54 8a 01 53 07 12 9d 54 04 02 16/02/14 22:43:03 INFO input.FileInputFormat: Total input paths to process : 1 16/02/14 22:43:06 INFO mapreduce.JobSubmitter: number of splits:1 16/02/14 22:43:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455503320604_0004 16/02/14 22:43:10 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:HDPCA, Ident: (HDFS_DELEGATION_TOKEN token 599 for ambari-qa) 16/02/14 22:43:10 WARN token.Token: Cannot find class for token kind kms-dt 16/02/14 22:43:10 WARN token.Token: Cannot find class for token kind kms-dt Kind: kms-dt, Service: 192.168.0.102:9292, Ident: 00 0f 61 6d 62 61 72 69 2d 71 61 2d 48 44 50 43 41 02 72 6d 00 8a 01 52 e3 06 19 54 8a 01 53 07 12 9d 54 04 02 16/02/14 22:43:19 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1455503320604_0004 is still in NEW 16/02/14 22:43:20 INFO impl.YarnClientImpl: Submitted application application_1455503320604_0004 16/02/14 22:43:21 INFO mapreduce.Job: The url to track the job: http://Lnx1.localdomain.com:8088/proxy/application_1455503320604_0004/ 16/02/14 22:43:21 INFO mapreduce.Job: Running job: job_1455503320604_0004
Created 02-22-2016 07:34 PM
I finally got the solution.
There are few failed MR jobs ate up the resource. After they been killed, the further job run smoothly.
Created 02-15-2016 10:13 PM
Reading from your logs I see..
- 3 failed attempts to allocate resources on host
- Blacklisted host Lnx1(.)localdomain(.)com
- Container exited with a non-zero exit code 143 is typical of a Memory configuration check your yarn-site.xml
This should help you understand the mechanism or hadoop concept of a blacklisted node to indicate that a node is unhealthy @link
Created 02-16-2016 03:59 AM
I tried increase the YARN containers memory from 1gb to 1.5gb, no help. it seems the problem was kerberos related.
The job completed without issue if kerberos was disabled.
Created 02-22-2016 07:34 PM
I finally got the solution.
There are few failed MR jobs ate up the resource. After they been killed, the further job run smoothly.