Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie workflow actions fail with "response from timeline server" error

avatar
New Contributor

Hello,

When I start an oozie workflow, then regardless of action type(sqoop, spark or ssh) it always fails with the same error from syslog:

2019-04-08 14:54:33,393 ERROR [pool-10-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"} 2019-04-08 14:54:33,394 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on JOB_SUBMITTED Event  for the job : job_1554726387894_0011 org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)     at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)     at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"}     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:322)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:478)     at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

What is causing this error?

example workflow.xml

<workflow-app xmlns = "uri:oozie:workflow:0.4" name="hadoop_main_workflow">


    <!-- start -->
    <start to = "spark_job"/>

    <action name="spark_job" retry-max="5" retry-interval="5">
            <spark xmlns="uri:oozie:spark-action:0.2">
                    <job-tracker>${resourceManager}</job-tracker>
                    <name-node>${nameNode}</name-node>
                    <master>yarn</master>
                    <mode>client</mode>
                    <name>spark_job</name>
                    <jar>spark_job.py</jar>
                    <spark-opts>
                            --master yarn
                            --deploy-mode client
                            --driver-memory 11288m
                            --executor-memory 24GB
                            --num-executors 8
                            --conf spark.dynamicAllocation.enabled=true
                            --conf spark.executor.cores=2
                            --conf spark.shuffle.service.enabled=true
                            --conf spark.yarn.driver.memoryOverhead=1024
                            --conf spark.yarn.executor.memoryOverhead=1024
                            --jars /usr/hdp/3.1.0.0-78/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar
                            --conf spark.security.credentials.hiveserver2.enabled=false
                            --py-files /usr/hdp/3.1.0.0-78/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip
                    </spark-opts>
                    <file>spark_job.py</file>
                    </spark>
            <ok to="end"/>
            <error to="kill"/>
    </action>

    <kill name = "kill_job">
        <message>Job failed</message>
    </kill>
    <end name = "end" />

</workflow-app>


job.properties:

nameNode=hdfs://namenodehost:8020
resourceManager=namenodehost:8050
queueName=${nameNode}/user/oozie/workflows/hadoop_main_workflow
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/oozie/workflows/hadoop_main_workflow
oozie.action.sharelib.for.sqoop=sqoop
oozie.action.sharelib.for.spark=spark2


Stack:
HDP 3.1.0

oozie 4.3.1.3.1.0.0-78


7 REPLIES 7

avatar
Explorer

@Grzegorz Jałocha I have the same problem, Have you solved it, please?

avatar
New Contributor

@bin liu Not yet. I will let know when I find a solution.

avatar
Explorer

@ Grzegorz Jałocha I have solved the problem.

The solution I found to fix this is as follow:

  1. Check your yarn logs (/var/log/hadoop-yarn/yarn/ on hdp) for anything clear to spot, for instance, not enough yarn memory (and then fix it if relevant),
  2. Clean up hdfs ATS data as described on the HDP docs,
  3. Clean up zookeeper ATS data (the example here is for insecure clusters, you will probably have another znode for kerberised clusters): zookeeper-client rmr /atsv2-hbase-unsecure
  4. Restart *all* YARN services,
  5. Restart ambari server (we had a case where it looked like the alert was wrongly cached).


I refer to the link below :

https://thisdataguy.com/2019/01/11/ats-server-does-not-start/


You need to clean up the ats related data in HDFS and zk, And then restart it

I hope help you .


avatar
Explorer

You can check yarn log file :

/var/log/hadoop-yarn/yarn/hadoop-yarn-timelinereader-xxxxx-.log

There's a hint NoNode for /atsv2-hbase-unsecure/meta-region-server

Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server

at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729)

at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)

... 17 more

Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server

at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)

at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:164)

at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:321)

avatar
Contributor

I have the same problem. Have you solved it?

avatar
Explorer

How did you fix your issue?

avatar
Explorer

have the same symptom, but a slightly different message on a newly built HDP3.0.1 cluster. This is from the YARN app log for the failed Oozie application:

 

2019-10-03 09:06:54,805 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,905 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,986 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on
JOB_SUBMITTED Event for the job : job_1570085949108_0002
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:299)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:495
)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 21 more