Support Questions

Find answers, ask questions, and share your expertise

HDP 2.4 upgrade using ambari - service check failed for Tez at 82%

avatar
Expert Contributor

I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version.

80% of upgrade is completed and all the core, slave,hive, spark are upgraded to latest version of HDP 2.4 . During the service check phase, Tez component is failed.

The status I see in the Application monitoring URL is

YARN Applicaiton Status: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

It waits in that state for sometime(300 seconds) and dies and the service checke is failed.

Verified that enough memory is available in the worker node

[root@usw2stdpwo13 ~]# free -g
                total       used       free     shared    buffers     cached
Mem:            29         20          9          0          0         17
-/+ buffers/cache:          2         27
Swap:            0          0          0

Ambari uses this script to launch the Tez job.

 2016-06-17 
00:33:52,149 - Execute['hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf
 jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount 
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': 
None, 'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 
'ambari-qa', 'path': ['/usr/hdp/2.4.0.0-169/hadoop/bin']}

Looking for any reference on where to look into the problem.

5081-screen-shot-2016-06-16-at-53213-pm.png

Here is the log message .

2016-06-17 00:33:51,153 - hadoop-client is currently at version 2.4.0.0-169
2016-06-17 00:33:51,153 - In the middle of a stack upgrade/downgrade for Stack HDP and destination version 2.4.0.0-169, determining which hadoop conf dir to use.
2016-06-17 00:33:51,175 - hadoop-client is currently at version 2.4.0.0-169
2016-06-17 00:33:51,175 - Hadoop conf dir: /usr/hdp/2.4.0.0-169/hadoop/conf
2016-06-17 00:33:51,176 - The hadoop conf dir /usr/hdp/2.4.0.0-169/hadoop/conf exists, will call conf-select on it for version 2.4.0.0-169
2016-06-17 00:33:51,176 - Checking if need to create versioned conf dir /etc/hadoop/2.4.0.0-169/0
2016-06-17 00:33:51,176 - call['conf-select create-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-06-17 00:33:51,198 - call returned (1, '/etc/hadoop/2.4.0.0-169/0 exist already', '')
2016-06-17 00:33:51,198 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.4.0.0-169 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-06-17 00:33:51,220 - checked_call returned (0, '/usr/hdp/2.4.0.0-169/hadoop/conf -> /etc/hadoop/2.4.0.0-169/0')
2016-06-17 00:33:51,220 - Ensuring that ha doop has the correct symlink structure




2016-06-17 00:33:51,220 - Using hadoop conf dir: /usr/hdp/2.4.0.0-169/hadoop/conf
2016-06-17 00:33:51,221 - File['/var/lib/ambari-agent/tmp/sample-tez-test'] {'content': 'foo\nbar\nfoo\nbar\nfoo', 'mode': 0755}
2016-06-17 00:33:51,224 - HdfsResource['/tmp/tezsmokeoutput'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 'hdfs://dfs-nameservices', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'action': ['delete_on_execute'], 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf', 'type': 'directory'}
2016-06-17 00:33:51,229 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpODrjve 2>/tmp/tmpYnLWoG''] {'quiet': False}
2016-06-17 00:33:51,280 - call returned (0, '')
2016-06-17 00:33:51,281 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpapa3XU 2>/tmp/tmpR_y0Gw''] {'quiet': False}
2016-06-17 00:33:51,323 - call returned (0, '')
2016-06-17 00:33:51,324 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,325 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp91MTHJ 2>/tmp/tmpPmJ5xb''] {'quiet': False}
2016-06-17 00:33:51,358 - call returned (0, '')
2016-06-17 00:33:51,359 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpdHS4sv 2>/tmp/tmpeCOdPj''] {'quiet': False}
2016-06-17 00:33:51,394 - call returned (0, '')
2016-06-17 00:33:51,394 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,395 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://usw2stdpma02.glassdoor.local:50070/webhdfs/v1/tmp/tezsmokeoutput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmphntRtB 2>/tmp/tmptIX11R''] {'logoutput': None, 'quiet': False}
2016-06-17 00:33:51,430 - call returned (0, '')
2016-06-17 00:33:51,431 - HdfsResource['/tmp/tezsmokeinput'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 'hdfs://dfs-nameservices', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf', 'type': 'directory', 'action': ['create_on_execute']}
2016-06-17 00:33:51,432 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp_5bK33 2>/tmp/tmpwAzh_T''] {'quiet': False}
2016-06-17 00:33:51,464 - call returned (0, '')
2016-06-17 00:33:51,468 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpPtsxnk 2>/tmp/tmpMWTzO7''] {'quiet': False}
2016-06-17 00:33:51,501 - call returned (0, '')
2016-06-17 00:33:51,501 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,502 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp88wDo7 2>/tmp/tmpug3GdS''] {'quiet': False}
2016-06-17 00:33:51,536 - call returned (0, '')
2016-06-17 00:33:51,537 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpt_fd35 2>/tmp/tmpCAdmVr''] {'quiet': False}
2016-06-17 00:33:51,568 - call returned (0, '')
2016-06-17 00:33:51,568 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,569 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://usw2stdpma02.glassdoor.local:50070/webhdfs/v1/tmp/tezsmokeinput?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpBQlBAe 2>/tmp/tmpxbrxBL''] {'logoutput': None, 'quiet': False}
2016-06-17 00:33:51,604 - call returned (0, '')
2016-06-17 00:33:51,605 - HdfsResource['/tmp/tezsmokeinput/sample-tez-test'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'source': '/var/lib/ambari-agent/tmp/sample-tez-test', 'default_fs': 'hdfs://dfs-nameservices', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'ambari-qa', 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf', 'type': 'file', 'action': ['create_on_execute']}
2016-06-17 00:33:51,606 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmp7BdsyF 2>/tmp/tmpgxhTFf''] {'quiet': False}
2016-06-17 00:33:51,637 - call returned (0, '')
2016-06-17 00:33:51,638 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpRHgo2M 2>/tmp/tmpDmcbzl''] {'quiet': False}
2016-06-17 00:33:51,671 - call returned (0, '')
2016-06-17 00:33:51,671 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,672 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpq5uw5q 2>/tmp/tmpCCZ2vd''] {'quiet': False}
2016-06-17 00:33:51,709 - call returned (0, '')
2016-06-17 00:33:51,710 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpbU6C0X 2>/tmp/tmp_7uguj''] {'quiet': False}
2016-06-17 00:33:51,744 - call returned (0, '')
2016-06-17 00:33:51,745 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,746 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://usw2stdpma02.glassdoor.local:50070/webhdfs/v1/tmp/tezsmokeinput/sample-tez-test?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpb9mzbG 2>/tmp/tmpAbdRYl''] {'logoutput': None, 'quiet': False}
2016-06-17 00:33:51,781 - call returned (0, '')
2016-06-17 00:33:51,781 - DFS file /tmp/tezsmokeinput/sample-tez-test is identical to /var/lib/ambari-agent/tmp/sample-tez-test, skipping the copying
2016-06-17 00:33:51,782 - Called copy_to_hdfs tarball: tez
2016-06-17 00:33:51,782 - Default version is 2.2.6.0-2800
2016-06-17 00:33:51,782 - Because this is a Stack Upgrade, will use version 2.4.0.0-169
2016-06-17 00:33:51,782 - Source file: /usr/hdp/2.4.0.0-169/tez/lib/tez.tar.gz , Dest file in HDFS: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz
2016-06-17 00:33:51,783 - HdfsResource['/hdp/apps/2.4.0.0-169/tez'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 'hdfs://dfs-nameservices', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf', 'type': 'directory', 'action': ['create_on_execute'], 'mode': 0555}
2016-06-17 00:33:51,783 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpEcvbTZ 2>/tmp/tmpgBQVKQ''] {'quiet': False}
2016-06-17 00:33:51,815 - call returned (0, '')
2016-06-17 00:33:51,816 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmphNPvbB 2>/tmp/tmpUSWk89''] {'quiet': False}
2016-06-17 00:33:51,855 - call returned (0, '')
2016-06-17 00:33:51,855 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,856 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmplT82QX 2>/tmp/tmpN985O7''] {'quiet': False}
2016-06-17 00:33:51,891 - call returned (0, '')
2016-06-17 00:33:51,892 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpqgXIjs 2>/tmp/tmpDlrgXa''] {'quiet': False}
2016-06-17 00:33:51,927 - call returned (0, '')
2016-06-17 00:33:51,927 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:51,928 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://usw2stdpma02.glassdoor.local:50070/webhdfs/v1/hdp/apps/2.4.0.0-169/tez?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpwiAl5x 2>/tmp/tmpqrnbVc''] {'logoutput': None, 'quiet': False}
2016-06-17 00:33:51,968 - call returned (0, '')
2016-06-17 00:33:51,969 - HdfsResource['/hdp/apps/2.4.0.0-169/tez/tez.tar.gz'] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'source': '/usr/hdp/2.4.0.0-169/tez/lib/tez.tar.gz', 'default_fs': 'hdfs://dfs-nameservices', 'replace_existing_files': False, 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf', 'type': 'file', 'action': ['create_on_execute'], 'mode': 0444}
2016-06-17 00:33:51,970 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpwPJrUN 2>/tmp/tmpmAwCWn''] {'quiet': False}
2016-06-17 00:33:52,002 - call returned (0, '')
2016-06-17 00:33:52,003 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpkT0ZPQ 2>/tmp/tmpeT9gMN''] {'quiet': False}
2016-06-17 00:33:52,038 - call returned (0, '')
2016-06-17 00:33:52,038 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:52,039 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma01.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpia0vG1 2>/tmp/tmpVYjzmf''] {'quiet': False}
2016-06-17 00:33:52,072 - call returned (0, '')
2016-06-17 00:33:52,073 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://usw2stdpma02.glassdoor.local:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpMKJxz2 2>/tmp/tmpQI2NrY''] {'quiet': False}
2016-06-17 00:33:52,109 - call returned (0, '')
2016-06-17 00:33:52,109 - NameNode HA states: active_namenodes = [('nn2', 'usw2stdpma02.glassdoor.local:50070')], standby_namenodes = [('nn1', 'usw2stdpma01.glassdoor.local:50070')], unknown_namenodes = []
2016-06-17 00:33:52,110 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X GET '"'"'http://usw2stdpma02.glassdoor.local:50070/webhdfs/v1/hdp/apps/2.4.0.0-169/tez/tez.tar.gz?op=GETFILESTATUS&user.name=hdfs'"'"' 1>/tmp/tmpa0QyZr 2>/tmp/tmpR0_UHC''] {'logoutput': None, 'quiet': False}
2016-06-17 00:33:52,147 - call returned (0, '')
2016-06-17 00:33:52,147 - DFS file /hdp/apps/2.4.0.0-169/tez/tez.tar.gz is identical to /usr/hdp/2.4.0.0-169/tez/lib/tez.tar.gz, skipping the copying
2016-06-17 00:33:52,148 - Will attempt to copy tez tarball from /usr/hdp/2.4.0.0-169/tez/lib/tez.tar.gz to DFS at /hdp/apps/2.4.0.0-169/tez/tez.tar.gz.
2016-06-17 00:33:52,148 - HdfsResource[None] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 'hdfs://dfs-nameservices', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': [EMPTY], 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf'}
2016-06-17 00:33:52,148 - ExecuteHadoop['jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5, 'tries': 3, 'bin_dir': '/usr/hdp/2.4.0.0-169/hadoop/bin', 'user': 'ambari-qa', 'conf_dir': '/usr/hdp/2.4.0.0-169/hadoop/conf'}
2016-06-17 00:33:52,149 - Execute['hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None, 'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path': ['/usr/hdp/2.4.0.0-169/hadoop/bin']} 
1 ACCEPTED SOLUTION

avatar
Master Guru

Since service checks of all major services were successful, you can just select "Ignore and Proceed" and handle the Tez issue once the update is over.

View solution in original post

4 REPLIES 4

avatar
Master Guru
@Anandha L Ranganathan

Can you please check status of your nodemanagers? If they are not healthy or crashing then your job will be in ACCEPTED state and will never come to in progress state.

avatar
Expert Contributor

@Kuldeep Kulkarni,

All nodes are live and Active status.

Also we are able to run MR job for the user ambari-qa , but it is failing for Tez job.

We try to run the job manually , and the job stops at this point.

hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/
WARNING: Use "yarn jar" to launch YARN applications.
16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.7.0.2.4.0.0-169, revision=3c1431f45faaca982ecc8dad13a107787b834696, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20160210-0711 ]
16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
16/06/17 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at usw2stdpma03.glassdoor.local/172.17.212.107:8050
16/06/17 19:04:48 INFO client.TezClient: Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
16/06/17 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount
16/06/17 19:04:49 INFO client.TezClient: Submitting DAG application with id: application_1466115469995_0142
16/06/17 19:04:49 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz
16/06/17 19:04:49 INFO client.TezClient: Stage directory /tmp/root/staging doesn't exist and is created
16/06/17 19:04:49 INFO client.TezClient: Tez system stage directory hdfs://dfs-nameservices/tmp/root/staging/.tez/application_1466115469995_0142 doesn't exist and is created
16/06/17 19:04:49 INFO acls.ATSHistoryACLPolicyManager: Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1466115469995_0142
16/06/17 19:04:50 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1466115469995_0142, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
16/06/17 19:04:50 INFO impl.YarnClientImpl: Submitted application application_1466115469995_0142
16/06/17 19:04:50 INFO client.TezClient: The url to track the Tez AM: http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/
16/06/17 19:04:50 INFO impl.TimelineClientImpl: Timeline service address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
16/06/17 19:04:50 INFO client.RMProxy: Connecting to ResourceManager at usw2stdpma03.glassdoor.local/172.17.212.107:8050
16/06/17 19:04:51 INFO client.DAGClientImpl: Waiting for DAG to start running

avatar
Master Guru

Since service checks of all major services were successful, you can just select "Ignore and Proceed" and handle the Tez issue once the update is over.

avatar
Expert Contributor

Thanks . That resolved the problem. I just continued with "Ignore and Proceed". Before finalize I ran the Tez job manually and it ran without any error. Do you know what was causing the the Tez job to be in ACCEPTED state and timed out after 300 seconds.