Created on 11-18-2019 08:26 AM - last edited on 11-18-2019 08:47 AM by cjervis
In our production environment, any MapReduce job submitted to YARN (via Oozie or distcp), the jobs are being stuck for a while (~5 to 20 mins) and we don't know why.
I've figured out that in "syslog" of YARN jobs, below is the output which I found interesting.
I didn't understand what this is waiting for and why is it waiting. Could you please help me out.
2019-11-18 10:04:18,350 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events.
2019-11-18 10:04:18,450 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,550 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,650 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,750 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
...
...
...
2019-11-18 10:07:46,813 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:07:46,896 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed to process Event JOB_FINISHED for the job : job_1572682899050_1783
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1405)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:299)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:495)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.<a href="http://www.http.HttpClient.parseHTTPHeader(HttpClient.java:704" target="_blank">www.http.HttpClient.parseHTTPHeader(HttpClient.java:704</a>)
at sun.net.<a href="http://www.http.HttpClient.parseHTTP(HttpClient.java:647" target="_blank">www.http.HttpClient.parseHTTP(HttpClient.java:647</a>)
at sun.net.<a href="http://www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569" target="_blank">www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569</a>)
at sun.net.<a href="http://www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474" target="_blank">www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474</a>)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.<a href="http://www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338" target="_blank">www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338</a>)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 21 more
2019-11-18 10:07:46,897 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:07:46,897 INFO [Thread-104] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Stopping TimelineClient.
2019-11-18 10:07:46,898 INFO [pool-11-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Timeline dispatcher thread was interrupted
2019-11-18 10:07:46,898 INFO [Thread-104] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Stopping TimelineClient.
2019-11-18 10:07:46,940 INFO [pool-12-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Timeline dispatcher thread was interrupted
2019-11-18 10:07:46,940 INFO [Thread-104] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
Created 01-31-2020 08:00 AM
I had the same problem:
I had to stop all YARN services
I cleared path from this property:
yarn.timeline-service.leveldb-state-store.path
Start YARN services and from this point everything works now.
Created 01-31-2020 08:32 AM
Unfortunately I cannot test this as I don't have enough access to do the same. I will pass on the same information to the concerned team and will share you the feedback.