Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Fails to start ambari-metrics-collector

Solved Go to solution
Highlighted

Fails to start ambari-metrics-collector

New Contributor

We have failed to start ambari-metrics-collector.


The following error appeared in hbase-ams-master.log.

I can not find another ERROR, what should I check?

----------------

/var/log/ambari-metrics-collector/hbase-ams-master-host.log

2019-07-11 15: 34: 41,040 ERROR [main] master.HMasterCommandLine: Master exiting

java.lang.RuntimeException: Master not initialized after 200000ms

at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent (JVMClusterUtil.java: 229)

at org.apache.hadoop.hbase.util.JVMClusterUtil.startup (JVMClusterUtil.java: 197)

at org.apache.hadoop.hbase.LocalHBaseCluster.startup (LocalHBaseCluster.java:413)

at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster (HMasterCommandLine.java: 232)

at org.apache.hadoop.hbase.master.HMasterCommandLine.run (HMasterCommandLine.java: 140)

at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java: 76)

at org.apache.hadoop.hbase.util.ServerCommandLine.doMain (ServerCommandLine.java: 149)

at org.apache.hadoop.hbase.master.HMaster.main (HMaster.java:3100)

2019-07-11 15: 34: 41,043 INFO [shutdown-hook-0] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook = true; fsShutdownHook = org.apache.hadoop.fs.FileSystem $ Cache $ ClientFinalizer @ 4a29f290

2019-07-11 15: 34: 41,044 INFO [shutdown-hook-0] regionserver.HRegionServer: ***** STOPPING region server 'areaportal-kvm07, 61320, 1562826676313' *****

2019-07-11 15: 34: 41,044 INFO [shutdown-hook-0] regionserver.HRegionServer: STOPPED: Shutdown hook

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Fails to start ambari-metrics-collector

Super Mentor

@YOSUKE SHIBUYA

In your "hbase-ams-master-kvm07log.txt" log we see the following message.

2019-07-11 19:11:58,731 INFO  [Thread-23] wal.ProcedureWALFile: Opening file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log length=45336
2019-07-11 19:11:58,743 WARN  [Thread-23] wal.WALProcedureStore: Unable to read tracker for file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException: Invalid Trailer version. got 48 expected 1
    at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:189)


Looks like the WAL Data "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/" got corrupted.

# ls -lart /var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*


May be you can take a backup of the dir "/var/lib/ambari-metrics-collector/hbase/"

and then try to clean the file present inside the "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*"


Then try to perform a tmp dir cleanup. After taking a backup of "/var/lib/ambari-metrics-collector/hbase-tmp/" Then

remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper AND any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder

"hbase.tmp.dir": (default value: /var/lib/ambari-metrics-collector/hbase-tmp) This is on local filesystem for both modes:


# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*
# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool/*


Then try to restart the AMS.

Better if you also increase the Metrics Collector Heap Size 1024MB and HBase Master Maximum Memory 2048MB. (or 4096MB) if you repeatedly see similar issue.

6 REPLIES 6

Re: Fails to start ambari-metrics-collector

Super Mentor

@YOSUKE SHIBUYA

The error snippet which you posted is just the after effect of the actual cause and a very generic message.

Can you please share the following logs for initial review?

/var/log/ambari-metrics-collector/ambari-metrics-collector.log
/var/log/ambari-metrics-collector/hbase-ams-master-xxxxxxxx.log
/var/log/ambari-metrics-collector/gc.log
/var/log/ambari-metrics-collector/collector-gc.log


Also most probably the AMS failure can happen due to incorrect tuning or heavy load. So can you please let us know the following:

1. How many nodes are there in your cluster?

2. How much memory have you allocated to the AMS collector and HMaster.

3. I guess you might be using default Embedded Mode AMS (not distributed) Both require slightly different kind of tuning.



Re: Fails to start ambari-metrics-collector

New Contributor

@Jay Kumar SenSharma

I have attached a log file.


The cluster has four nodes. Each node has 32GB of memory.


The memory is specified as follows.

Metrics Collector Heap Size 512MB

HBase Master Maximum Memory 1408 MB

hbase_master_maxperm_size 128MB

HBase Master maximum value for Xmn 1024MB

HBase RegionServer Maximum Memory 768 MB

Re: Fails to start ambari-metrics-collector

Super Mentor

@YOSUKE SHIBUYA

In your "hbase-ams-master-kvm07log.txt" log we see the following message.

2019-07-11 19:11:58,731 INFO  [Thread-23] wal.ProcedureWALFile: Opening file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log length=45336
2019-07-11 19:11:58,743 WARN  [Thread-23] wal.WALProcedureStore: Unable to read tracker for file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException: Invalid Trailer version. got 48 expected 1
    at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:189)


Looks like the WAL Data "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/" got corrupted.

# ls -lart /var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*


May be you can take a backup of the dir "/var/lib/ambari-metrics-collector/hbase/"

and then try to clean the file present inside the "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*"


Then try to perform a tmp dir cleanup. After taking a backup of "/var/lib/ambari-metrics-collector/hbase-tmp/" Then

remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper AND any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder

"hbase.tmp.dir": (default value: /var/lib/ambari-metrics-collector/hbase-tmp) This is on local filesystem for both modes:


# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*
# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool/*


Then try to restart the AMS.

Better if you also increase the Metrics Collector Heap Size 1024MB and HBase Master Maximum Memory 2048MB. (or 4096MB) if you repeatedly see similar issue.

Re: Fails to start ambari-metrics-collector

New Contributor

@Jay Kumar SenSharma

I am able to start ambari-metrics-collector.

Thank you for your support.

Re: Fails to start ambari-metrics-collector

Super Mentor

@YOSUKE SHIBUYA

Good to know that your issue is resolved. It will be great if you can mark this thread as Answered by clicking on the "Accept" button on the helpful answer.

Re: Fails to start ambari-metrics-collector

Community Manager

The above question was originally posted in the Community Help track. On Sun Jul 14 17:04 UTC 2019, a member of the HCC moderation staff moved it to the Cloud & Operations track. The Community Help Track is intended for questions about using the HCC site itself.