Created 04-09-2019 02:44 PM
Hello,
When I start an oozie workflow, then regardless of action type(sqoop, spark or ssh) it always fails with the same error from syslog:
2019-04-08 14:54:33,393 ERROR [pool-10-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"} 2019-04-08 14:54:33,394 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on JOB_SUBMITTED Event for the job : job_1554726387894_0011 org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Response from the timeline server is not successful, HTTP error code: 500, Server response: {"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 280 actions: IOException: 280 times, servers with issues: null","javaClassName":"javax.ws.rs.WebApplicationException"} at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:322) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:478) at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
What is causing this error?
example workflow.xml
<workflow-app xmlns = "uri:oozie:workflow:0.4" name="hadoop_main_workflow"> <!-- start --> <start to = "spark_job"/> <action name="spark_job" retry-max="5" retry-interval="5"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn</master> <mode>client</mode> <name>spark_job</name> <jar>spark_job.py</jar> <spark-opts> --master yarn --deploy-mode client --driver-memory 11288m --executor-memory 24GB --num-executors 8 --conf spark.dynamicAllocation.enabled=true --conf spark.executor.cores=2 --conf spark.shuffle.service.enabled=true --conf spark.yarn.driver.memoryOverhead=1024 --conf spark.yarn.executor.memoryOverhead=1024 --jars /usr/hdp/3.1.0.0-78/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar --conf spark.security.credentials.hiveserver2.enabled=false --py-files /usr/hdp/3.1.0.0-78/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip </spark-opts> <file>spark_job.py</file> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name = "kill_job"> <message>Job failed</message> </kill> <end name = "end" /> </workflow-app>
job.properties:
nameNode=hdfs://namenodehost:8020 resourceManager=namenodehost:8050 queueName=${nameNode}/user/oozie/workflows/hadoop_main_workflow oozie.use.system.libpath=true oozie.wf.application.path=${nameNode}/user/oozie/workflows/hadoop_main_workflow oozie.action.sharelib.for.sqoop=sqoop oozie.action.sharelib.for.spark=spark2
Stack:
HDP 3.1.0
oozie 4.3.1.3.1.0.0-78
Created 04-15-2019 04:51 AM
@Grzegorz Jałocha I have the same problem, Have you solved it, please?
Created 04-15-2019 11:40 AM
@bin liu Not yet. I will let know when I find a solution.
Created 04-19-2019 02:53 AM
@ Grzegorz Jałocha I have solved the problem.
The solution I found to fix this is as follow:
I refer to the link below :
https://thisdataguy.com/2019/01/11/ats-server-does-not-start/
You need to clean up the ats related data in HDFS and zk, And then restart it
I hope help you .
Created 04-19-2019 05:27 AM
You can check yarn log file :
/var/log/hadoop-yarn/yarn/hadoop-yarn-timelinereader-xxxxx-.log
There's a hint NoNode for /atsv2-hbase-unsecure/meta-region-server
Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729) at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325) ... 17 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-unsecure/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:164) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:321)
Created 05-30-2019 10:32 AM
I have the same problem. Have you solved it?
Created 09-03-2019 10:00 AM
How did you fix your issue?
Created 10-03-2019 02:11 AM
have the same symptom, but a slightly different message on a newly built HDP3.0.1 cluster. This is from the YARN app log for the failed Oozie application:
2019-10-03 09:06:54,805 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,905 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2019-10-03 09:06:54,986 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on
JOB_SUBMITTED Event for the job : job_1570085949108_0002
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1254)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:299)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:374)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$EntitiesHolder$1.call(TimelineV2ClientImpl.java:367)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.publishWithoutBlockingOnQueue(TimelineV2ClientImpl.java:495
)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher$1.run(TimelineV2ClientImpl.java:433)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 21 more