Created 12-08-2016 07:26 AM
Hi, I am suddenly having these errors with my tez jobs scheduled in oozie.
Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504) ... 8 more Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info
Created 12-08-2016 02:29 PM
@Joshua Adeleke -Can you please look into stderr section of oozie launcher logs? If possible, please attach complete oozie launcher logs in this thread
Created 12-08-2016 10:28 PM
@Kuldeep Kulkarni Nothing more in the stderr log...
Log Type: stderr Log Upload Time: Thu Dec 08 03:12:13 +0200 2016 Log Length: 2297 WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties OK Time taken: 2.664 seconds WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504) ... 8 more WARNING: Use "yarn jar" to launch YARN applications. Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties Query ID = yarn_20161208031144_730168c4-3230-4dcc-b968-a6b42b97a8f2 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1474363291123_56265) Map 1: -/- Reducer 2: 0/1 Map 1: -/- Reducer 2: 0/1 Map 1: -/- Reducer 2: 0(+1)/1 Map 1: -/- Reducer 2: 1/1 OK Time taken: 17.308 seconds, Fetched: 1 row(s) log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Created 12-09-2016 09:05 AM
Can you get complete application logs for application_1474363291123_56265.
Created 12-09-2016 09:13 AM
Here... @Kuldeep Kulkarni
Log Type: stderr Log Upload Time: Thu Dec 08 03:12:09 +0200 2016 Log Length: 120 2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1 2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1 Log Type: stdout Log Upload Time: Thu Dec 08 03:12:09 +0200 2016 Log Length: 1658 4.856: [GC (Metadata GC Threshold) [PSYoungGen: 251930K->18253K(611840K)] 251930K->18261K(2010112K), 1.4076599 secs] [Times: user=22.12 sys=6.93, real=1.41 secs] 6.264: [Full GC (Metadata GC Threshold) [PSYoungGen: 18253K->0K(611840K)] [ParOldGen: 8K->16975K(761856K)] 18261K->16975K(1373696K), [Metaspace: 20928K->20928K(1069056K)], 1.2067695 secs] [Times: user=22.62 sys=3.73, real=1.21 secs] 2016-12-08 03:11:51 Running Dag: dag_1474363291123_56265_1 20.906: [GC (Metadata GC Threshold) [PSYoungGen: 412868K->18856K(785920K)] 429844K->35840K(1547776K), 0.8018663 secs] [Times: user=16.64 sys=1.06, real=0.80 secs] 21.708: [Full GC (Metadata GC Threshold) [PSYoungGen: 18856K->0K(785920K)] [ParOldGen: 16983K->22662K(1044992K)] 35840K->22662K(1830912K), [Metaspace: 34928K->34928K(1081344K)], 0.0379021 secs] [Times: user=0.36 sys=0.01, real=0.04 secs] 2016-12-08 03:12:01 Completed Dag: dag_1474363291123_56265_1 Heap PSYoungGen total 785920K, used 204117K [0x0000000762200000, 0x00000007a8a80000, 0x00000007c0000000) eden space 766976K, 26% used [0x0000000762200000,0x000000076e9557a0,0x0000000790f00000) from space 18944K, 0% used [0x00000007a7800000,0x00000007a7800000,0x00000007a8a80000) to space 87040K, 0% used [0x000000079e080000,0x000000079e080000,0x00000007a3580000) ParOldGen total 1044992K, used 22662K [0x00000006a6600000, 0x00000006e6280000, 0x0000000762200000) object space 1044992K, 2% used [0x00000006a6600000,0x00000006a7c21810,0x00000006e6280000) Metaspace used 41929K, capacity 42269K, committed 42712K, reserved 1087488K class space used 4965K, capacity 5051K, committed 5120K, reserved 1048576K
Created 12-09-2016 11:44 PM
This is not the app log. Please use "yarn log --applicationId <appId>" to retrieve the app log.
Created 12-10-2016 04:49 AM
@Joshua Adeleke for
Causedby: org.apache.tez.dag.api.SessionNotRunning:TezSession has already shutdown.No cluster diagnostics found.at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:730)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:217)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
it seems that your application master/cluster is busy or did not get enough resources to process DAG over RPC request thats why was not able to copy the dag from client.if there are EOFExceptions in Appliction master logs then do check your network, for a workaround you can increase tez.session.am.dag.submit.timeout.secs which is currently 300 seconds to 900 seconds.
Created 12-13-2016 02:39 PM
@Rajkumar Singh Thank you. I have also increased the timeout to 900 (from 600sec) but issue persist.
Created 12-13-2016 02:40 PM
@Joshua Adeleke did you see any exception/error in application master log
Created 12-13-2016 03:32 PM
@Rajkumar Singh I saw the exception below;
2016-12-13 09:10:54,239 [INFO] [AMRM Callback Handler Thread] |impl.AMRMClientAsyncImpl|: Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287)