Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie Job Fails + NodeManager TimeOut

Highlighted

Oozie Job Fails + NodeManager TimeOut

Hi,

When I try to run a hive job under oozie, it fails all the time with the following error:

Log Aggregation Status     TIME_OUT
Diagnostics:     
Application application_1464178993574_42107 failed 2 times due to Error launching appattempt_1464178993574_42107_000002. Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.2.1.11:49971 remote=node011.mapreduce.net/10.2.8.4:45454]; Host Details : local host is: "master001.mapreduce.net/10.2.1.11"; destination host is: "node011.mapreduce.net":45454;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
at org.apache.hadoop.ipc.Client.call(Client.java:1431)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy89.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.2.1.11:49971 remote=node011.mapreduce.net/10.2.8.4:45454]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1397) 

So two questions:

  1. Do you have an idea about the reason of this error ?
  2. When YARN get a TimeOut from one of a NodeManagers, it should not runs the task failed on another Node ?

Thanks a lot in Advance for your help.

3 REPLIES 3
Highlighted

Re: Oozie Job Fails + NodeManager TimeOut

Expert Contributor

i know this old, but i am currently facing same exact issue with oozie and yarn, have you solved this or knew the root cause ?

Highlighted

Re: Oozie Job Fails + NodeManager TimeOut

Expert Contributor

but in my case, its not constant sometimes it happens and sometimes not


Highlighted

This is an intermittent issue on my Hadoop Cluster, I see this for all the jobs and cannot find a way to troubleshoot it

New Contributor

2020-05-11 15:43:16,239 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_e88_1589171925074_0626_01_000002 : java.net.SocketTimeoutException: Call From wo3pfhadl12w/10.11.105.123 to wo3pfhadl34w.woo.sing..com:45454 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.11.105.123:40884 remote=wo3pfhadl34w.woo.sing..com/10.11.105.149:45454]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:775)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
at org.apache.hadoop.ipc.Client.call(Client.java:1443)
at org.apache.hadoop.ipc.Client.call(Client.java:1353)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy85.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy86.startContainers(Unknown Source)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:160)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:394)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.11.105.123:40884 remote=wo3pfhadl34w.woo.sing.com/10.11.105.149:45454]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:554)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1802)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1167)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1063)

 

Don't have an account?
Coming from Hortonworks? Activate your account here