Support Questions

Find answers, ask questions, and share your expertise

YARN issues critical error "ATSv2 HBase Application" (fresh installation of Ambari 2.7.1, HDP 3.0.1)

avatar
Contributor

After fresh installation I got the following critical error alert in YARN:

Title: ATSv2 HBase Application

Response:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 183, in execute
    ats_hbase_app_info = make_valid_json(output)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 226, in make_valid_json
    raise Fail("Couldn't validate the received output for JSON parsing.")
Fail: Couldn't validate the received output for JSON parsing.
12 REPLIES 12

avatar
Contributor

HDP3.1,ambari 2.7, debian 9


2019-05-31 07:36:53,863 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x4d5650ae no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests.

2019-05-31 07:37:53,542 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor

2019-05-31 07:37:53,544 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down

java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55)

at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)

at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:332)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)

at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)

at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)

at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)

at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)

at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53)

... 9 more

Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server

at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729)

at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)

avatar
Contributor

Create different queue for ats_hbase. ats_hbase can't work properly because it can't get enough resources.

avatar
Explorer

Hi all,

 

I'm not sure if this issue is considered solved. In case it helps, I explain how we did it.

 

We found the same error after removing several nodes of our kerberized cluster (Ambari 2.7.4 and HDP 3.1.4).

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
ats-hbase Failed : HTTP error code : 500


Following this thread, we checked carefully the YARN configuration to ensure that all the variables were correctly scaled to the available nodes.

 

After that, we destroyed the yarn app:

 

$ yarn app -destroy ats-hbase
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:14 INFO client.ApiServiceClient: Successfully destroyed service ats-hbase
$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
Service ats-hbase not found

 

Thus, we restarted all the YARN service on ambari. Now, everything is running fine. 

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
{"name":"ats-hbase","id":"application_1604297264331_0001","artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"lifetime":-1,"components":[{"name":"master","dependencies":[],"artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"resource":{"cpus":1,"memory":"4096","additional":{}},"state":"STABLE","configuration":{"properties":{"yarn.service.container-failure.retry.max":"10","yarn.service.framework.path":"/hdp/apps/3.1.4.0-315/yarn/rm2/service-dep.tar.gz"},"env":{"HBASE_LOG_PREFIX":"hbase-$HBASE_IDENT_STRING-master-$HOSTNAME","HBASE_LOGFILE":"$HBASE_LOG_PREFIX.log","HBASE_MASTER_OPTS":"-Xms3276m -Xmx3276m -Djava.security.auth.login.config=/usr/hdp/3.1.4.0-315/hadoop/conf/embedded-yarn-ats-hbase/yarn_hbase_master_jaas.conf",
[...]