Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

MR jobs hang for a while saying "AsyncDispatcher is draining to stop, ignoring any new events.

avatar
New Contributor

In our production environment, any MapReduce job submitted to YARN (via Oozie or distcp), the jobs are being stuck for a while (~5 to 20 mins) and we don't know why.

I've figured out that in "syslog" of YARN jobs, below is the output which I found interesting.

I didn't understand what this is waiting for and why is it waiting. Could you please help me out.

 

 

2019-11-18 10:04:18,350 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events.
2019-11-18 10:04:18,450 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,550 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,650 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:04:18,750 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
...
...
...
2019-11-18 10:07:46,813 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:07:46,896 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed to process Event JOB_FINISHED for the job : job_1572682899050_1783
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1405)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
	at com.sun.jersey.api.client.Client.handle(Client.java:652)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
	at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:299)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:495)
	at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
	at sun.security.ssl.InputRecord.read(InputRecord.java:503)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at sun.net.<a href="http://www.http.HttpClient.parseHTTPHeader(HttpClient.java:704" target="_blank">www.http.HttpClient.parseHTTPHeader(HttpClient.java:704</a>)
	at sun.net.<a href="http://www.http.HttpClient.parseHTTP(HttpClient.java:647" target="_blank">www.http.HttpClient.parseHTTP(HttpClient.java:647</a>)
	at sun.net.<a href="http://www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569" target="_blank">www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569</a>)
	at sun.net.<a href="http://www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474" target="_blank">www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474</a>)
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
	at sun.net.<a href="http://www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338" target="_blank">www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338</a>)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
	... 21 more
2019-11-18 10:07:46,897 INFO [Thread-104] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-11-18 10:07:46,897 INFO [Thread-104] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Stopping TimelineClient.
2019-11-18 10:07:46,898 INFO [pool-11-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Timeline dispatcher thread was interrupted 
2019-11-18 10:07:46,898 INFO [Thread-104] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Stopping TimelineClient.
2019-11-18 10:07:46,940 INFO [pool-12-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Timeline dispatcher thread was interrupted 
2019-11-18 10:07:46,940 INFO [Thread-104] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()

 

 

 

2 REPLIES 2

avatar
New Contributor

I had the same problem:

I had to stop all YARN services

I cleared path from this property:

yarn.timeline-service.leveldb-state-store.path 

Start YARN services and from this point everything works now.

 

 

avatar
New Contributor

Unfortunately I cannot test this as I don't have enough access to do the same. I will pass on the same information to the concerned team and will share you the feedback.