Support Questions

Find answers, ask questions, and share your expertise

Enable connect to time line server

avatar
Contributor

Hi,

I have a question about the yarn and timeLineServer: when a task-map fails to connect to time line server and update her status, it is killed by the resource manager ?

I had a problem with the application master, i sawed 155 attempts killed and only the last map has succeeded.

For all the 155 attempts, i saw on the le log the same following error:

2016-09-21 10:10:33,549 INFO  [ATS Logger 0] hooks.ATSHook (ATSHook.java:run(136)) - Failed to submit plan to ATS: java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
	at com.sun.jersey.api.client.Client.handle(Client.java:648)
	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
	at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
	at org.apache.hadoop.hive.ql.hooks.ATSHook.fireAndForget(ATSHook.java:200)
	at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:122)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Thanks

5 REPLIES 5

avatar
Expert Contributor

@Ahmed ELJAMI this looks like just an INFO message. I would look at the logs for each attempt to see why it is failing. You should be able to see this in the RM UI.

avatar
New Contributor

We just upgraded fully working HDF + HDP clusters to HDF 3.0.1.1 and HDP 2.6.1 respectively and are now seeing a nearly identical error in our logs. I have tried to track this down in RM UI and various log files, but cannot seem to figure it out. On the HDF side, we're using HiveStreaming, which appears to be causing the errors.

2017-08-22 16:32:05,169 WARN [ATS Logger 0] org.apache.hadoop.hive.ql.hooks.ATSHook Failed to create ATS domain hive_bf609617-b443-4e32-a8af-3527b33dcb52
java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:209)
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:250)
    at com.sun.jersey.api.client.Client.handle(Client.java:652)
    at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
    at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
    at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:161)
    at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
    at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
    at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putDomain(TimelineWriter.java:98)
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putDomain(TimelineClientImpl.java:355)
    at org.apache.hadoop.hive.ql.hooks.ATSHook.createTimelineDomain(ATSHook.java:122)
    at org.apache.hadoop.hive.ql.hooks.ATSHook.access$200(ATSHook.java:62)
    at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:179)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

avatar

I'm facing the same problem. NiFi logs are filled up with Timeline server connection problem.

Have you got any solution?

avatar

For anybody having problem with timeline server warnings/error messages when using hiveql or hivestreaming in NiFi:

define inside nifi conf directory symbolic link to /etc/hadoop/conf/yarn-site.xml in order to allow nifi read property defining actual timeline server address and port solves this issue

avatar
New Contributor

Add setting to the property file: timeout=15

it helped me