Support Questions

Find answers, ask questions, and share your expertise

Error with Tez jobs

Expert Contributor

Hi, I am suddenly having these errors with my tez jobs scheduled in oozie.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
	... 8 more
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info
15 REPLIES 15

Super Guru

@Joshua Adeleke -Can you please look into stderr section of oozie launcher logs? If possible, please attach complete oozie launcher logs in this thread

Expert Contributor

@Kuldeep Kulkarni Nothing more in the stderr log...

             Log Type: stderr
             Log Upload Time: Thu Dec 08 03:12:13 +0200 2016
             Log Length: 2297
          
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
OK
Time taken: 2.664 seconds
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found.
	at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
	... 8 more
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Query ID = yarn_20161208031144_730168c4-3230-4dcc-b968-a6b42b97a8f2
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1474363291123_56265)

Map 1: -/-	Reducer 2: 0/1	
Map 1: -/-	Reducer 2: 0/1	
Map 1: -/-	Reducer 2: 0(+1)/1	
Map 1: -/-	Reducer 2: 1/1	
OK
Time taken: 17.308 seconds, Fetched: 1 row(s)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

          

Super Guru

@Joshua Adeleke

Can you get complete application logs for application_1474363291123_56265.

Expert Contributor

Here... @Kuldeep Kulkarni

Log Type: stderr
Log Upload Time: Thu Dec 08 03:12:09 +0200 2016
Log Length: 120
2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1
2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1
Log Type: stdout
Log Upload Time: Thu Dec 08 03:12:09 +0200 2016
Log Length: 1658
4.856: [GC (Metadata GC Threshold) [PSYoungGen: 251930K->18253K(611840K)] 251930K->18261K(2010112K), 1.4076599 secs] [Times: user=22.12 sys=6.93, real=1.41 secs] 
6.264: [Full GC (Metadata GC Threshold) [PSYoungGen: 18253K->0K(611840K)] [ParOldGen: 8K->16975K(761856K)] 18261K->16975K(1373696K), [Metaspace: 20928K->20928K(1069056K)], 1.2067695 secs] [Times: user=22.62 sys=3.73, real=1.21 secs] 
2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1
20.906: [GC (Metadata GC Threshold) [PSYoungGen: 412868K->18856K(785920K)] 429844K->35840K(1547776K), 0.8018663 secs] [Times: user=16.64 sys=1.06, real=0.80 secs] 
21.708: [Full GC (Metadata GC Threshold) [PSYoungGen: 18856K->0K(785920K)] [ParOldGen: 16983K->22662K(1044992K)] 35840K->22662K(1830912K), [Metaspace: 34928K->34928K(1081344K)], 0.0379021 secs] [Times: user=0.36 sys=0.01, real=0.04 secs] 
2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1
Heap
 PSYoungGen      total 785920K, used 204117K [0x0000000762200000, 0x00000007a8a80000, 0x00000007c0000000)
  eden space 766976K, 26% used [0x0000000762200000,0x000000076e9557a0,0x0000000790f00000)
  from space 18944K, 0% used [0x00000007a7800000,0x00000007a7800000,0x00000007a8a80000)
  to   space 87040K, 0% used [0x000000079e080000,0x000000079e080000,0x00000007a3580000)
 ParOldGen       total 1044992K, used 22662K [0x00000006a6600000, 0x00000006e6280000, 0x0000000762200000)
  object space 1044992K, 2% used [0x00000006a6600000,0x00000006a7c21810,0x00000006e6280000)
 Metaspace       used 41929K, capacity 42269K, committed 42712K, reserved 1087488K
  class space    used 4965K, capacity 5051K, committed 5120K, reserved 1048576K

Explorer

This is not the app log. Please use "yarn log --applicationId <appId>" to retrieve the app log.

@Joshua Adeleke for

Causedby: org.apache.tez.dag.api.SessionNotRunning:TezSession has already shutdown.No cluster diagnostics found.at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)

at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)

at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)

it seems that your application master/cluster is busy or did not get enough resources to process DAG over RPC request thats why was not able to copy the dag from client.if there are EOFExceptions in Appliction master logs then do check your network, for a workaround you can increase tez.session.am.dag.submit.timeout.secs which is currently 300 seconds to 900 seconds.

Expert Contributor

@Rajkumar Singh Thank you. I have also increased the timeout to 900 (from 600sec) but issue persist.

@Joshua Adeleke did you see any exception/error in application master log

Expert Contributor

@Rajkumar Singh I saw the exception below;

2016-12-13 09:10:54,239 [INFO] [AMRM Callback Handler Thread] |impl.AMRMClientAsyncImpl|: Interrupted while waiting for queue
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)

Expert Contributor

@Rajkumar Singh log attached.

@Joshua Adeleke looks resources issue here, from memory side it looks good but no of core available is only 1.

2016-12-13 09:10:51,742 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:168960, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 2401 lastPreemptionHeartbeat: 2400

Expert Contributor

Either the Tez application did not start - in which case you will not find any YARN application for this.

OR

Your Tez application started but did not receive any DAG to run and timed out. You will find the exact reason in the YARN application master log for this job.

OR

Your Tez application started and crashed unexpectedly. You will find the exact reason in the YARN application master log for this job.

Expert Contributor

@bikas I have ransacked the YARN logs. Would you please tell me where i can find the exact reason why these jobs are failing?

Expert Contributor

If the AM timed out, then in the AM log you will find "Session timed out".

If the AM crashed, you will find an exception in the AM log or some error in the AM stderr/stdout.

New Contributor

Try below command and let me know if it works for you. It worked for me.

hdfs dfs -chmod -R /tmp/hive

,

Try below command and let me know if it worked.

hdfs dfs -chmod -R 777 /tmp/hive