Support Questions

Find answers, ask questions, and share your expertise

Error with Tez jobs

avatar
Expert Contributor

Hi, I am suddenly having these errors with my tez jobs scheduled in oozie.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
	... 8 more
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info
15 REPLIES 15

avatar
Master Guru

@Joshua Adeleke -Can you please look into stderr section of oozie launcher logs? If possible, please attach complete oozie launcher logs in this thread

avatar
Expert Contributor

@Kuldeep Kulkarni Nothing more in the stderr log...

             Log Type: stderr
             Log Upload Time: Thu Dec 08 03:12:13 +0200 2016
             Log Length: 2297
          
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
OK
Time taken: 2.664 seconds
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
	... 8 more
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Query ID = yarn_20161208031144_730168c4-3230-4dcc-b968-a6b42b97a8f2
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1474363291123_56265)

Map 1: -/-	Reducer 2: 0/1	
Map 1: -/-	Reducer 2: 0/1	
Map 1: -/-	Reducer 2: 0(+1)/1	
Map 1: -/-	Reducer 2: 1/1	
OK
Time taken: 17.308 seconds, Fetched: 1 row(s)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

          

avatar
Master Guru

@Joshua Adeleke

Can you get complete application logs for application_1474363291123_56265.

avatar
Expert Contributor

Here... @Kuldeep Kulkarni

Log Type: stderr
Log Upload Time: Thu Dec 08 03:12:09 +0200 2016
Log Length: 120
2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1
2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1
Log Type: stdout
Log Upload Time: Thu Dec 08 03:12:09 +0200 2016
Log Length: 1658
4.856: [GC (Metadata GC Threshold) [PSYoungGen: 251930K->18253K(611840K)] 251930K->18261K(2010112K), 1.4076599 secs] [Times: user=22.12 sys=6.93, real=1.41 secs] 
6.264: [Full GC (Metadata GC Threshold) [PSYoungGen: 18253K->0K(611840K)] [ParOldGen: 8K->16975K(761856K)] 18261K->16975K(1373696K), [Metaspace: 20928K->20928K(1069056K)], 1.2067695 secs] [Times: user=22.62 sys=3.73, real=1.21 secs] 
2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1
20.906: [GC (Metadata GC Threshold) [PSYoungGen: 412868K->18856K(785920K)] 429844K->35840K(1547776K), 0.8018663 secs] [Times: user=16.64 sys=1.06, real=0.80 secs] 
21.708: [Full GC (Metadata GC Threshold) [PSYoungGen: 18856K->0K(785920K)] [ParOldGen: 16983K->22662K(1044992K)] 35840K->22662K(1830912K), [Metaspace: 34928K->34928K(1081344K)], 0.0379021 secs] [Times: user=0.36 sys=0.01, real=0.04 secs] 
2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1
Heap
 PSYoungGen      total 785920K, used 204117K [0x0000000762200000, 0x00000007a8a80000, 0x00000007c0000000)
  eden space 766976K, 26% used [0x0000000762200000,0x000000076e9557a0,0x0000000790f00000)
  from space 18944K, 0% used [0x00000007a7800000,0x00000007a7800000,0x00000007a8a80000)
  to   space 87040K, 0% used [0x000000079e080000,0x000000079e080000,0x00000007a3580000)
 ParOldGen       total 1044992K, used 22662K [0x00000006a6600000, 0x00000006e6280000, 0x0000000762200000)
  object space 1044992K, 2% used [0x00000006a6600000,0x00000006a7c21810,0x00000006e6280000)
 Metaspace       used 41929K, capacity 42269K, committed 42712K, reserved 1087488K
  class space    used 4965K, capacity 5051K, committed 5120K, reserved 1048576K

avatar
Contributor

This is not the app log. Please use "yarn log --applicationId <appId>" to retrieve the app log.

avatar
Super Guru

@Joshua Adeleke for

Causedby: org.apache.tez.dag.api.SessionNotRunning:TezSession has already shutdown.No cluster diagnostics found.at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)

at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)

at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)

it seems that your application master/cluster is busy or did not get enough resources to process DAG over RPC request thats why was not able to copy the dag from client.if there are EOFExceptions in Appliction master logs then do check your network, for a workaround you can increase tez.session.am.dag.submit.timeout.secs which is currently 300 seconds to 900 seconds.

avatar
Expert Contributor

@Rajkumar Singh Thank you. I have also increased the timeout to 900 (from 600sec) but issue persist.

avatar
Super Guru

@Joshua Adeleke did you see any exception/error in application master log

avatar
Expert Contributor

@Rajkumar Singh I saw the exception below;

2016-12-13 09:10:54,239 [INFO] [AMRM Callback Handler Thread] |impl.AMRMClientAsyncImpl|: Interrupted while waiting for queue
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)