Support Questions

Find answers, ask questions, and share your expertise

Fails to start ambari-metrics-collector

avatar
New Contributor

We have failed to start ambari-metrics-collector.


The following error appeared in hbase-ams-master.log.

I can not find another ERROR, what should I check?

----------------

/var/log/ambari-metrics-collector/hbase-ams-master-host.log

2019-07-11 15: 34: 41,040 ERROR [main] master.HMasterCommandLine: Master exiting

java.lang.RuntimeException: Master not initialized after 200000ms

at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent (JVMClusterUtil.java: 229)

at org.apache.hadoop.hbase.util.JVMClusterUtil.startup (JVMClusterUtil.java: 197)

at org.apache.hadoop.hbase.LocalHBaseCluster.startup (LocalHBaseCluster.java:413)

at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster (HMasterCommandLine.java: 232)

at org.apache.hadoop.hbase.master.HMasterCommandLine.run (HMasterCommandLine.java: 140)

at org.apache.hadoop.util.ToolRunner.run (ToolRunner.java: 76)

at org.apache.hadoop.hbase.util.ServerCommandLine.doMain (ServerCommandLine.java: 149)

at org.apache.hadoop.hbase.master.HMaster.main (HMaster.java:3100)

2019-07-11 15: 34: 41,043 INFO [shutdown-hook-0] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook = true; fsShutdownHook = org.apache.hadoop.fs.FileSystem $ Cache $ ClientFinalizer @ 4a29f290

2019-07-11 15: 34: 41,044 INFO [shutdown-hook-0] regionserver.HRegionServer: ***** STOPPING region server 'areaportal-kvm07, 61320, 1562826676313' *****

2019-07-11 15: 34: 41,044 INFO [shutdown-hook-0] regionserver.HRegionServer: STOPPED: Shutdown hook

1 ACCEPTED SOLUTION

avatar
Master Mentor

@YOSUKE SHIBUYA

In your "hbase-ams-master-kvm07log.txt" log we see the following message.

2019-07-11 19:11:58,731 INFO  [Thread-23] wal.ProcedureWALFile: Opening file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log length=45336
2019-07-11 19:11:58,743 WARN  [Thread-23] wal.WALProcedureStore: Unable to read tracker for file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException: Invalid Trailer version. got 48 expected 1
    at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:189)


Looks like the WAL Data "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/" got corrupted.

# ls -lart /var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*


May be you can take a backup of the dir "/var/lib/ambari-metrics-collector/hbase/"

and then try to clean the file present inside the "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*"


Then try to perform a tmp dir cleanup. After taking a backup of "/var/lib/ambari-metrics-collector/hbase-tmp/" Then

remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper AND any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder

"hbase.tmp.dir": (default value: /var/lib/ambari-metrics-collector/hbase-tmp) This is on local filesystem for both modes:


# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*
# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool/*


Then try to restart the AMS.

Better if you also increase the Metrics Collector Heap Size 1024MB and HBase Master Maximum Memory 2048MB. (or 4096MB) if you repeatedly see similar issue.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@YOSUKE SHIBUYA

The error snippet which you posted is just the after effect of the actual cause and a very generic message.

Can you please share the following logs for initial review?

/var/log/ambari-metrics-collector/ambari-metrics-collector.log
/var/log/ambari-metrics-collector/hbase-ams-master-xxxxxxxx.log
/var/log/ambari-metrics-collector/gc.log
/var/log/ambari-metrics-collector/collector-gc.log


Also most probably the AMS failure can happen due to incorrect tuning or heavy load. So can you please let us know the following:

1. How many nodes are there in your cluster?

2. How much memory have you allocated to the AMS collector and HMaster.

3. I guess you might be using default Embedded Mode AMS (not distributed) Both require slightly different kind of tuning.



avatar
New Contributor

@Jay Kumar SenSharma

I have attached a log file.


The cluster has four nodes. Each node has 32GB of memory.


The memory is specified as follows.

Metrics Collector Heap Size 512MB

HBase Master Maximum Memory 1408 MB

hbase_master_maxperm_size 128MB

HBase Master maximum value for Xmn 1024MB

HBase RegionServer Maximum Memory 768 MB

avatar
Master Mentor

@YOSUKE SHIBUYA

In your "hbase-ams-master-kvm07log.txt" log we see the following message.

2019-07-11 19:11:58,731 INFO  [Thread-23] wal.ProcedureWALFile: Opening file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log length=45336
2019-07-11 19:11:58,743 WARN  [Thread-23] wal.WALProcedureStore: Unable to read tracker for file:/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/pv2-00000000000000000001.log
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat$InvalidWALDataException: Invalid Trailer version. got 48 expected 1
    at org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.readTrailer(ProcedureWALFormat.java:189)


Looks like the WAL Data "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/" got corrupted.

# ls -lart /var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*


May be you can take a backup of the dir "/var/lib/ambari-metrics-collector/hbase/"

and then try to clean the file present inside the "/var/lib/ambari-metrics-collector/hbase/MasterProcWALs/*"


Then try to perform a tmp dir cleanup. After taking a backup of "/var/lib/ambari-metrics-collector/hbase-tmp/" Then

remove the AMS zookeeper data by backing up and removing the contents of 'hbase.tmp.dir'/zookeeper AND any Phoenix spool files from 'hbase.tmp.dir'/phoenix-spool folder

"hbase.tmp.dir": (default value: /var/lib/ambari-metrics-collector/hbase-tmp) This is on local filesystem for both modes:


# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/*
# rm -fr /var/lib/ambari-metrics-collector/hbase-tmp/phoenix-spool/*


Then try to restart the AMS.

Better if you also increase the Metrics Collector Heap Size 1024MB and HBase Master Maximum Memory 2048MB. (or 4096MB) if you repeatedly see similar issue.

avatar
New Contributor

@Jay Kumar SenSharma

I am able to start ambari-metrics-collector.

Thank you for your support.

avatar
Master Mentor

@YOSUKE SHIBUYA

Good to know that your issue is resolved. It will be great if you can mark this thread as Answered by clicking on the "Accept" button on the helpful answer.

avatar

The above question was originally posted in the Community Help track. On Sun Jul 14 17:04 UTC 2019, a member of the HCC moderation staff moved it to the Cloud & Operations track. The Community Help Track is intended for questions about using the HCC site itself.

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.