Support Questions

Find answers, ask questions, and share your expertise

MapReduce timeout

avatar
Contributor

2016-02-14 22:40:05,909 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf

2016-02-14 22:40:06,618 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-02-14 22:40:06,760 - HdfsResource['/user/ambari-qa/mapredsmokeoutput'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['delete_on_execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'directory'}
2016-02-14 22:40:06,860 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-HDPCA@EXAMPLE.COM'] {'user': 'hdfs'}
2016-02-14 22:40:11,788 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpZmSrzL 2>/tmp/tmpxqBP9F''] {'quiet': False}
2016-02-14 22:40:15,601 - call returned (0, '')
2016-02-14 22:40:15,603 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpWcmtSE 2>/tmp/tmp8ueZHF''] {'quiet': False}
2016-02-14 22:40:19,015 - call returned (0, '')
2016-02-14 22:40:19,017 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = []
2016-02-14 22:40:19,018 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpEDIPa1 2>/tmp/tmp1Xt3Yx''] {'quiet': False}
2016-02-14 22:40:22,856 - call returned (0, '')
2016-02-14 22:40:22,858 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpKQmmZw 2>/tmp/tmpJ6YFEL''] {'quiet': False}
2016-02-14 22:40:26,162 - call returned (0, '')
2016-02-14 22:40:26,164 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = []
2016-02-14 22:40:26,167 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeoutput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpC3nS44 2>/tmp/tmpYhq8nH''] {'logoutput': None, 'quiet': False}
2016-02-14 22:40:29,885 - call returned (0, '')
2016-02-14 22:40:30,159 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X DELETE --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeoutput?op=DELETE&user.name=hdfs&recursive=True'"'"' 1>/tmp/tmpKxdOxa 2>/tmp/tmpNflsmo''] {'logoutput': None, 'quiet': False}
2016-02-14 22:40:34,695 - call returned (0, '')
2016-02-14 22:40:34,697 - HdfsResource['/user/ambari-qa/mapredsmokeinput'] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'source': '/etc/passwd', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['create_on_execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 'file'}
2016-02-14 22:40:34,699 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-HDPCA@EXAMPLE.COM'] {'user': 'hdfs'}
2016-02-14 22:40:36,030 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpWi075d 2>/tmp/tmpfGORFu''] {'quiet': False}
2016-02-14 22:40:39,946 - call returned (0, '')
2016-02-14 22:40:39,948 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpVeHNPf 2>/tmp/tmpXbZTBA''] {'quiet': False}
2016-02-14 22:40:41,300 - call returned (0, '')
2016-02-14 22:40:41,302 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = []
2016-02-14 22:40:41,303 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx0.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp_2srqJ 2>/tmp/tmpaBMZT6''] {'quiet': False}
2016-02-14 22:40:41,588 - call returned (0, '')
2016-02-14 22:40:41,591 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://lnx1.localdomain.com:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpzAgUPP 2>/tmp/tmp_JVFtg''] {'quiet': False}
2016-02-14 22:40:41,859 - call returned (0, '')
2016-02-14 22:40:41,861 - NameNode HA states: active_namenodes = [(u'nn1', 'lnx0.localdomain.com:50070')], standby_namenodes = [(u'nn2', 'lnx1.localdomain.com:50070')], unknown_namenodes = []
2016-02-14 22:40:41,867 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : '"'"'http://lnx0.localdomain.com:50070/webhdfs/v1/user/ambari-qa/mapredsmokeinput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmp1NN7xj 2>/tmp/tmpdtdIhc''] {'logoutput': None, 'quiet': False}
2016-02-14 22:40:42,493 - call returned (0, '')
2016-02-14 22:40:42,494 - DFS file /user/ambari-qa/mapredsmokeinput is identical to /etc/passwd, skipping the copying
2016-02-14 22:40:42,495 - HdfsResource[None] {'security_enabled': True, 'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': '/etc/security/keytabs/hdfs.headless.keytab', 'default_fs': 'hdfs://HDPCA', 'hdfs_site': ..., 'kinit_path_local': '/usr/bin/kinit', 'principal_name': 'hdfs-HDPCA@EXAMPLE.COM', 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf'}
2016-02-14 22:40:42,495 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-HDPCA@EXAMPLE.COM;'] {'user': 'ambari-qa'}
2016-02-14 22:40:42,778 - ExecuteHadoop['jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount /user/ambari-qa/mapredsmokeinput /user/ambari-qa/mapredsmokeoutput'] {'bin_dir': '/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-client/bin', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'logoutput': True, 'try_sleep': 5, 'tries': 1, 'user': 'ambari-qa'}
2016-02-14 22:40:42,881 - Execute['hadoop --config /usr/hdp/current/hadoop-client/conf jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar wordcount /user/ambari-qa/mapredsmokeinput /user/ambari-qa/mapredsmokeoutput'] {'logoutput': True, 'try_sleep': 5, 'environment': {}, 'tries': 1, 'user': 'ambari-qa', 'path': ['/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin:/usr/hdp/current/hadoop-yarn-client/bin']}
WARNING: Use "yarn jar" to launch YARN applications.
16/02/14 22:42:50 INFO impl.TimelineClientImpl: Timeline service address: http://lnx1.localdomain.com:8188/ws/v1/timeline/
16/02/14 22:42:51 INFO client.RMProxy: Connecting to ResourceManager at Lnx1.localdomain.com/192.168.122.40:8050
16/02/14 22:42:53 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 599 for ambari-qa on ha-hdfs:HDPCA
16/02/14 22:42:54 INFO security.TokenCache: Got dt for hdfs://HDPCA; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:HDPCA, Ident: (HDFS_DELEGATION_TOKEN token 599 for ambari-qa)
16/02/14 22:42:54 WARN token.Token: Cannot find class for token kind kms-dt
16/02/14 22:42:54 INFO security.TokenCache: Got dt for hdfs://HDPCA; Kind: kms-dt, Service: 192.168.0.102:9292, Ident: 00 0f 61 6d 62 61 72 69 2d 71 61 2d 48 44 50 43 41 02 72 6d 00 8a 01 52 e3 06 19 54 8a 01 53 07 12 9d 54 04 02
16/02/14 22:43:03 INFO input.FileInputFormat: Total input paths to process : 1
16/02/14 22:43:06 INFO mapreduce.JobSubmitter: number of splits:1
16/02/14 22:43:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455503320604_0004
16/02/14 22:43:10 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:HDPCA, Ident: (HDFS_DELEGATION_TOKEN token 599 for ambari-qa)
16/02/14 22:43:10 WARN token.Token: Cannot find class for token kind kms-dt
16/02/14 22:43:10 WARN token.Token: Cannot find class for token kind kms-dt
Kind: kms-dt, Service: 192.168.0.102:9292, Ident: 00 0f 61 6d 62 61 72 69 2d 71 61 2d 48 44 50 43 41 02 72 6d 00 8a 01 52 e3 06 19 54 8a 01 53 07 12 9d 54 04 02
16/02/14 22:43:19 INFO impl.YarnClientImpl: Application submission is not finished, submitted application application_1455503320604_0004 is still in NEW
16/02/14 22:43:20 INFO impl.YarnClientImpl: Submitted application application_1455503320604_0004
16/02/14 22:43:21 INFO mapreduce.Job: The url to track the job: http://Lnx1.localdomain.com:8088/proxy/application_1455503320604_0004/
16/02/14 22:43:21 INFO mapreduce.Job: Running job: job_1455503320604_0004
1 ACCEPTED SOLUTION

avatar
Contributor

I finally got the solution.

There are few failed MR jobs ate up the resource. After they been killed, the further job run smoothly.

View solution in original post

12 REPLIES 12

avatar
Master Mentor

@wei yang

Reading from your logs I see..

- 3 failed attempts to allocate resources on host

- Blacklisted host Lnx1(.)localdomain(.)com

- Container exited with a non-zero exit code 143 is typical of a Memory configuration check your yarn-site.xml

This should help you understand the mechanism or hadoop concept of a blacklisted node to indicate that a node is unhealthy @link

avatar
Contributor

I tried increase the YARN containers memory from 1gb to 1.5gb, no help. it seems the problem was kerberos related.

The job completed without issue if kerberos was disabled.

avatar
Contributor

I finally got the solution.

There are few failed MR jobs ate up the resource. After they been killed, the further job run smoothly.