Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie workflow actions fail with "response from timeline server" error

Highlighted

Oozie workflow actions fail with "response from timeline server" error

New Contributor

Hello,

When I start an oozie workflow, then regardless of action type(sqoop, spark or ssh) it always fails with the same error from syslog:

2019-04-08 14:54:33,393 ERROR [pool-10-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"} 2019-04-08 14:54:33,394 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on JOB_SUBMITTED Event  for the job : job_1554726387894_0011 org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)     at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"}     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:322)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:478)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

What is causing this error?

example workflow.xml

<workflow-app xmlns = "uri:oozie:workflow:0.4" name="hadoop_main_workflow">


    <!-- start -->
    <start to = "spark_job"/>

    <action name="spark_job" retry-max="5" retry-interval="5">
            <spark xmlns="uri:oozie:spark-action:0.2">
                    <job-tracker>${resourceManager}</job-tracker>
                    <name-node>${nameNode}</name-node>
                    <master>yarn</master>
                    <mode>client</mode>
                    <name>spark_job</name>
                    <jar>spark_job.py</jar>
                    <spark-opts>
                            --master yarn
                            --deploy-mode client
                            --driver-memory 11288m
                            --executor-memory 24GB
                            --num-executors 8
                            --conf spark.dynamicAllocation.enabled=true
                            --conf spark.executor.cores=2
                            --conf spark.shuffle.service.enabled=true
                            --conf spark.yarn.driver.memoryOverhead=1024
                            --conf spark.yarn.executor.memoryOverhead=1024
                            --jars /usr/hdp/3.1.0.0-78/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar
                            --conf spark.security.credentials.hiveserver2.enabled=false
                            --py-files /usr/hdp/3.1.0.0-78/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip
                    </spark-opts>
                    <file>spark_job.py</file>
                    </spark>
            <ok to="end"/>
            <error to="kill"/>
    </action>

    <kill name = "kill_job">
        <message>Job failed</message>
    </kill>
    <end name = "end" />

</workflow-app>


job.properties:

nameNode=hdfs://namenodehost:8020
resourceManager=namenodehost:8050
queueName=${nameNode}/user/oozie/workflows/hadoop_main_workflow
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/oozie/workflows/hadoop_main_workflow
oozie.action.sharelib.for.sqoop=sqoop
oozie.action.sharelib.for.spark=spark2


Stack:
HDP 3.1.0

oozie 4.3.1.3.1.0.0-78


7 REPLIES 7

Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

@Grzegorz Jałocha I have the same problem, Have you solved it, please?

Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

@bin liu Not yet. I will let know when I find a solution.

Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

@ Grzegorz Jałocha I have solved the problem.

The solution I found to fix this is as follow:

  1. Check your yarn logs (/var/log/hadoop-yarn/yarn/ on hdp) for anything clear to spot, for instance, not enough yarn memory (and then fix it if relevant),
  2. Clean up hdfs ATS data as described on the HDP docs,
  3. Clean up zookeeper ATS data (the example here is for insecure clusters, you will probably have another znode for kerberised clusters): zookeeper-client rmr /atsv2-hbase-unsecure
  4. Restart *all* YARN services,
  5. Restart ambari server (we had a case where it looked like the alert was wrongly cached).


I refer to the link below :

https://thisdataguy.com/2019/01/11/ats-server-does-not-start/


You need to clean up the ats related data in HDFS and zk, And then restart it

I hope help you .


Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

You can check yarn log file :

/var/log/hadoop-yarn/yarn/hadoop-yarn-timelinereader-xxxxx-.log

There's a hint NoNode for /atsv2-hbase-unsecure/meta-region-server

Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server

at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729)

at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)

... 17 more

Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server

at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:164)

at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:321)

Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

I have the same problem. Have you solved it?

Re: Oozie workflow actions fail with "response from timeline server" error

Explorer

How did you fix your issue?

Re: Oozie workflow actions fail with "response from timeline server" error

New Contributor

have the same symptom, but a slightly different message on a newly built HDP3.0.1 cluster. This is from the YARN app log for the failed Oozie application:

 

2019-10-03 09:06:54,805 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,905 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,986 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on
JOB_SUBMITTED Event for the job : job_1570085949108_0002
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:299)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:495
)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 21 more

Don't have an account?
Coming from Hortonworks? Activate your account here