Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie Job Fails + NodeManager TimeOut

Oozie Job Fails + NodeManager TimeOut

New Contributor

Hi,

When I try to run a hive job under oozie, it fails all the time with the following error:

Log Aggregation Status     TIME_OUT
Diagnostics:     
Application application_1464178993574_42107 failed 2 times due to Error launching appattempt_1464178993574_42107_000002. Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.2.1.11:49971 remote=node011.mapreduce.net/10.2.8.4:45454]; Host Details : local host is: "master001.mapreduce.net/10.2.1.11"; destination host is: "node011.mapreduce.net":45454;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
at org.apache.hadoop.ipc.Client.call(Client.java:1431)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy89.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.2.1.11:49971 remote=node011.mapreduce.net/10.2.8.4:45454]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1397) 

So two questions:

  1. Do you have an idea about the reason of this error ?
  2. When YARN get a TimeOut from one of a NodeManagers, it should not runs the task failed on another Node ?

Thanks a lot in Advance for your help.

2 REPLIES 2
Highlighted

Re: Oozie Job Fails + NodeManager TimeOut

Expert Contributor

i know this old, but i am currently facing same exact issue with oozie and yarn, have you solved this or knew the root cause ?

Re: Oozie Job Fails + NodeManager TimeOut

Expert Contributor

but in my case, its not constant sometimes it happens and sometimes not