Support Questions
Find answers, ask questions, and share your expertise

YARN issues critical error "ATSv2 HBase Application" (fresh installation of Ambari 2.7.1, HDP 3.0.1)

Explorer

After fresh installation I got the following critical error alert in YARN:

Title: ATSv2 HBase Application

Response:

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 183, in execute
    ats_hbase_app_info = make_valid_json(output)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 226, in make_valid_json
    raise Fail("Couldn't validate the received output for JSON parsing.")
Fail: Couldn't validate the received output for JSON parsing.
12 REPLIES 12

Super Mentor
@Zholaman Kubaliyev

Can you please check if the following "status" command is giving you proper output?

# su - yarn-ats -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase"


If "ats-hbase" service is down the you might see the error. Because the failing alert will also run the same query and then will attempt to parse the JSON output so if the status is not returning fine they you should see the alert.

Please check if your "ats-hbase" service is running?

Reference:
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/options_to_re...

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/enable_system...

Super Mentor

@Zholaman Kubaliyev

Ambair also allows to disable alerts .. so if it is your test cluster and if you just want to suppress that alert then just click on that alert link in ambari UI and "Disable" button in the alerts page.

https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/managing-and-monitoring-ambari/content/amb_...


Explorer

@Jay Kumar SenSharma

Hi Jay Kumar!

The (# su - yarn-ats -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase") output :

[root@hadoop anaconda3]# su - yarn-ats -c "/usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase"
18/10/25 13:18:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop.test.com/xxx.xxx.xxx.xxx:8050
18/10/25 13:18:42 INFO client.AHSProxy: Connecting to Application History server at hadoop.test.com/xxx.xxx.xxx.xxx:10200
18/10/25 13:18:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop.test.com/xxx.xxx.xxx.xxx:8050
18/10/25 13:18:42 INFO client.AHSProxy: Connecting to Application History server at hadoop.test.com/xxx.xxx.xxx.xxx:10200
18/10/25 13:18:42 INFO util.log: Logging initialized @1294ms
ats-hbase Failed : HTTP error code : 500
[root@hadoop anaconda3]#


As I understand the service "ats-hbase" is down. To start this service I should follow the instructions given in - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/data-operating-system/content/enable_system...

Am I right ? How important this service (ats-hbase) for production ?

Thank you!

Explorer

 

  • Move ats-hbase from the default queue to the yarn-system queue.
    yarn application -changeQueue yarn-system -appId <app-id>

    Here, <app-id> is the ID of the ats-hbase service.

    I stopped at this step and the alert notification still showing although I have restart the yarn and refresh yarn capacity scheduler, how I can find the app-id?
    I have try " yarn app -status ats-hbase " but it returns

    20/07/30 01:42:49 INFO client.RMProxy: Connecting to ResourceManager at hdpdev02.bps.go.id/10.0.45.112:8050
    20/07/30 01:42:49 INFO client.AHSProxy: Connecting to Application History server at hdpdev02.bps.go.id/10.0.45.112:10200
    20/07/30 01:42:50 INFO client.RMProxy: Connecting to ResourceManager at hdpdev02.bps.go.id/10.0.45.112:8050
    20/07/30 01:42:50 INFO client.AHSProxy: Connecting to Application History server at hdpdev02.bps.go.id/10.0.45.112:10200
    ats-hbase Failed : HTTP error code : 500

    NB: The cluster is a kerberized cluster

 

I have the same problem,

Can you help me?

Explorer

Hi, could you post here you alert message ?

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 183, in execute
    ats_hbase_app_info = make_valid_json(output)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/alerts/alert_ats_hbase.py", line 226, in make_valid_json
    raise Fail("Couldn't validate the received output for JSON parsing.")
 Fail: Couldn't validate the received output for JSON parsing.

Explorer

what version of OS, HDP do you use ?

ambari2.7 、HDP3.1

HDP3.1,ambari 2.7, debian 9


2019-05-31 07:36:53,863 INFO zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x4d5650ae no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests.

2019-05-31 07:37:53,542 INFO storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(170)) - Running HBase liveness monitor

2019-05-31 07:37:53,544 WARN storage.HBaseTimelineReaderImpl (HBaseTimelineReaderImpl.java:run(183)) - Got failure attempting to read from timeline storage, assuming HBase down

java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55)

at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)

at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl$HBaseMonitor.run(HBaseTimelineReaderImpl.java:174)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:332)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)

at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)

at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)

at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)

at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)

at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)

at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)

at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:53)

... 9 more

Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /atsv2-hbase-secure/meta-region-server

at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2002)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateMeta(ConnectionImplementation.java:762)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:729)

at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:707)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:911)

at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:325)

Explorer

Create different queue for ats_hbase. ats_hbase can't work properly because it can't get enough resources.

Explorer

Hi all,

 

I'm not sure if this issue is considered solved. In case it helps, I explain how we did it.

 

We found the same error after removing several nodes of our kerberized cluster (Ambari 2.7.4 and HDP 3.1.4).

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:04:39 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
ats-hbase Failed : HTTP error code : 500


Following this thread, we checked carefully the YARN configuration to ensure that all the variables were correctly scaled to the available nodes.

 

After that, we destroyed the yarn app:

 

$ yarn app -destroy ats-hbase
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:13 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:14 INFO client.ApiServiceClient: Successfully destroyed service ats-hbase
$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:06:19 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
Service ats-hbase not found

 

Thus, we restarted all the YARN service on ambari. Now, everything is running fine. 

 

$ /usr/hdp/current/hadoop-yarn-client/bin/yarn app -status ats-hbase
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
20/11/02 07:09:02 INFO client.AHSProxy: Connecting to Application History server at XXXXX/YYY.YYY.YY.YY:10200
{"name":"ats-hbase","id":"application_1604297264331_0001","artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"lifetime":-1,"components":[{"name":"master","dependencies":[],"artifact":{"id":"/hdp/apps/3.1.4.0-315/hbase/rm2/hbase.tar.gz","type":"TARBALL"},"resource":{"cpus":1,"memory":"4096","additional":{}},"state":"STABLE","configuration":{"properties":{"yarn.service.container-failure.retry.max":"10","yarn.service.framework.path":"/hdp/apps/3.1.4.0-315/yarn/rm2/service-dep.tar.gz"},"env":{"HBASE_LOG_PREFIX":"hbase-$HBASE_IDENT_STRING-master-$HOSTNAME","HBASE_LOGFILE":"$HBASE_LOG_PREFIX.log","HBASE_MASTER_OPTS":"-Xms3276m -Xmx3276m -Djava.security.auth.login.config=/usr/hdp/3.1.4.0-315/hadoop/conf/embedded-yarn-ats-hbase/yarn_hbase_master_jaas.conf",
[...]

 

; ;