Created 09-22-2016 09:41 AM
Hi,
I have a question about the yarn and timeLineServer: when a task-map fails to connect to time line server and update her status, it is killed by the resource manager ?
I had a problem with the application master, i sawed 155 attempts killed and only the last map has succeeded.
For all the 155 attempts, i saw on the le log the same following error:
2016-09-21 10:10:33,549 INFO [ATS Logger 0] hooks.ATSHook (ATSHook.java:run(136)) - Failed to submit plan to ATS: java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) at org.apache.hadoop.hive.ql.hooks.ATSHook.fireAndForget(ATSHook.java:200) at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:122) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Thanks
Created 09-26-2016 06:26 PM
@Ahmed ELJAMI this looks like just an INFO message. I would look at the logs for each attempt to see why it is failing. You should be able to see this in the RM UI.
Created 08-22-2017 11:37 PM
We just upgraded fully working HDF + HDP clusters to HDF 3.0.1.1 and HDP 2.6.1 respectively and are now seeing a nearly identical error in our logs. I have tried to track this down in RM UI and various log files, but cannot seem to figure it out. On the HDF side, we're using HiveStreaming, which appears to be causing the errors.
2017-08-22 16:32:05,169 WARN [ATS Logger 0] org.apache.hadoop.hive.ql.hooks.ATSHook Failed to create ATS domain hive_bf609617-b443-4e32-a8af-3527b33dcb52 java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:209) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:250) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:161) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putDomain(TimelineWriter.java:98) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putDomain(TimelineClientImpl.java:355) at org.apache.hadoop.hive.ql.hooks.ATSHook.createTimelineDomain(ATSHook.java:122) at org.apache.hadoop.hive.ql.hooks.ATSHook.access$200(ATSHook.java:62) at org.apache.hadoop.hive.ql.hooks.ATSHook$2.run(ATSHook.java:179) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Created 02-10-2018 06:11 PM
I'm facing the same problem. NiFi logs are filled up with Timeline server connection problem.
Have you got any solution?
Created 02-13-2018 04:37 PM
For anybody having problem with timeline server warnings/error messages when using hiveql or hivestreaming in NiFi:
define inside nifi conf directory symbolic link to /etc/hadoop/conf/yarn-site.xml in order to allow nifi read property defining actual timeline server address and port solves this issue
Created 09-17-2018 07:59 PM
Add setting to the property file: timeout=15
it helped me