Created 05-22-2020 01:02 AM
Hello,
Ambari on HDP cluster (v. 3.1.0.0 - 78) shows the following alert for the service YARN TIMELINE SERVICE V2.0 READER:
The alert is still showing even after performing all the step described here:
Furthermore:
- Oozie jobs get unsuccessful response from TimelineServer v2, as follows:
2020-05-20 10:46:08,871 ERROR [pool-10-thread-1] org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl: Response from the timeline server is not successful, HTTP error code: 500, Server response:
{"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 201 actions: NotServingRegionException: 201 times, servers with issues:
xxx.xxx.local,61320,1576513062105","javaClassName":"javax.ws.rs.WebApplicationException"}
2020-05-20 10:46:08,872 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Exception while publishing configs on JOB_SUBMITTED Event for the job : job_1589797714740_0025
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.publishConfigsOnJobSubmittedEvent(JobHistoryEventHandler.java:1255)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1414)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Response from the timeline server is not successful, HTTP error code: 500, Server response:
{"exception":"WebApplicationException","message":"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 201 actions: NotServingRegionException: 201 times, servers with issues:
xxx.xxx.local,61320,1576513062105","javaClassName":"javax.ws.rs.WebApplicationException"}
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:322)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putObjects(TimelineV2ClientImpl.java:251)
- In HBase (v. 2.0.2) there are many regions in transitions and the hbase hbck –details command shows the following errors:
...
Number of regions in transition: 64
...
prod.timelineservice.entity,an,1576512763369.ca2577830eb1ec2858fecb3a182bb7ec. state=OPENING, ts=Thu May 21 08:50:49 CEST 2020 (PT29M32.253S ago), server=nul
...
prod.timelineservice.application,,1576512767939.d87a0e7a2f479d6f73e697bfcac9d415. state=OPENING, ts=Thu May 21 08:50:49 CEST 2020 (PT29M32.262S ago), server=null
...
---- Table 'prod.timelineservice.application': overlap groups
There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table prod.timelineservice.application
...
ERROR: There is a hole in the region chain between ad and an. You need to create a new .regioninfo and region dir in hdfs to plug the hole
...
Status: INCONSISTENT
- HBase prod.timelineservice.* tables are not online, as shows below in HBase shell:
Any help would be really appreciated.
Thanks
Created 11-12-2020 10:11 AM
Hello @SimL
YARN Timeline Server v2 Reader uses HBase for Storage. The Log & HBCK Report shows the Table "prod.timelineservice.entity" has Regions in OPENING State, which would naturally cause any Client accessing the Table report RetriesExhaustedWithDetailsException.
A Region is typically stuck in OPENING State, if there is any issues with WALEditReplay or any other reason, which would be clear from the RegionServer Logs where the Region is being opened. Unless the RegionServer Logs are available, any Comment is unlikely to be an accurate assessment of the issue your team is facing. Would recommend checking the RegionServer Logs wherein the Regions are in OPENING State & based on the cause, plan accordingly.
- Smarak