Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN issues critical error "ATSv2 HBase Application" (fresh installation of Ambari 2.7.1, HDP 3.0.1)

avatar
Contributor

After fresh installation I got the following critical error alert in YARN:

Title: ATSv2 HBase Application

Response:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 183, in execute
    ats_hbase_app_info = make_valid_json(output)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 226, in make_valid_json
    raise Fail("Couldn't validate the received output for JSON parsing.")
Fail: Couldn't validate the received output for JSON parsing.
12 REPLIES 12

avatar
Contributor

HDP3.1,ambari 2.7, debian 9


2019-05-31 07:36:53,863 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x4d5650ae no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests.

2019-05-31 07:37:53,542 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor

2019-05-31 07:37:53,544 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down

java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55)

at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)

at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:332)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)

at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)

at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)

at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)

at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)

at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53)

... 9 more

Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server

at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729)

at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)

avatar
Contributor

Create different queue for ats_hbase. ats_hbase can't work properly because it can't get enough resources.

avatar
Explorer

Hi all,

 

I'm not sure if this issue is considered solved. In case it helps, I explain how we did it.

 

We found the same error after removing several nodes of our kerberized cluster (Ambari 2.7.4 and HDP 3.1.4).

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
ats-hbase Failed : HTTP error code : 500


Following this thread, we checked carefully the YARN configuration to ensure that all the variables were correctly scaled to the available nodes.

 

After that, we destroyed the yarn app:

 

$ yarn app -destroy ats-hbase
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:14 INFO client.ApiServiceClient: Successfully destroyed service ats-hbase
$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
Service ats-hbase not found

 

Thus, we restarted all the YARN service on ambari. Now, everything is running fine. 

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
{"name":"ats-hbase","id":"application_1604297264331_0001","artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"lifetime":-1,"components":[{"name":"master","dependencies":[],"artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"resource":{"cpus":1,"memory":"4096","additional":{}},"state":"STABLE","configuration":{"properties":{"yarn.service.container-failure.retry.max":"10","yarn.service.framework.path":"/hdp/apps/3.1.4.0-315/yarn/rm2/service-dep.tar.gz"},"env":{"HBASE_LOG_PREFIX":"hbase-$HBASE_IDENT_STRING-master-$HOSTNAME","HBASE_LOGFILE":"$HBASE_LOG_PREFIX.log","HBASE_MASTER_OPTS":"-Xms3276m -Xmx3276m -Djava.security.auth.login.config=/usr/hdp/3.1.4.0-315/hadoop/conf/embedded-yarn-ats-hbase/yarn_hbase_master_jaas.conf",
[...]