Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Master Never Start up after HCP cluster restart

avatar

I've set up my HCP cluster with 4 nodes. But I observed that my HBase Master never starts up after cluster restart. If I leave the cluster up and running from the installation time, then HBase works fine. But once I shut down the cluster and the try to start again Master fails to start.

When I looked into the master log I found out that, during shut down time there was the following warning entries:

------------------------------------------------------------------------------------------

2018-06-14 14:48:14,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} 2018-06-14 14:48:15,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available. 2018-06-14 14:48:15,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} 2018-06-14 14:48:16,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available. 2018-06-14 14:48:16,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} 2018-06-14 14:48:17,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available.

----------------------------------------------------------------------------------------------

And on Cluster restart, I found the following warning entries in the master log file:

----------------------------------------------------------------------------------------------

2018-06-15 14:12:41,118 INFO [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: No node available for BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 2018-06-15 14:12:41,118 INFO [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 from any node: java.io.IOException: No live nodes contain block BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry... 2018-06-15 14:12:41,119 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 11833.399457334988 msec. 2018-06-15 14:12:52,955 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException 2018-06-15 14:12:52,955 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException 2018-06-15 14:12:52,956 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: DFS Read org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001

2018-06-15 14:12:52,958 FATAL [vmbdsiwbdn2:16000.activeMasterManager] master.HMaster: Failed to become active master org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001

----------------------------------------------------------------------------------------------

Please suggest how to overcome this scenario.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Victor Sarkar

This is the culprit "Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001"

How did you shutdown the cluster?

List the corrupt HDFS blocks:

$ hdfs fsck -list-corruptfileblocks 

Remove corrupt block with below method

$ hdfs fsck / -delete 

Determine problematic files with problems

$ hdfs fsck / | egrep -v '^\.+ 

Once you found the files that are corrupt delete them

$ hdfs fsck /path/to/corrupt/file -locations -blocks -files 

Repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

$ hdfs fs -rm /path/to/file/with/permanently/missing/blocks

HTH

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

Can you check whether HDFS is healthy. Do you see any missing blocks in namenode?

2018-06-15 14:12:52,958 FATAL [vmbdsiwbdn2:16000.activeMasterManager] master.HMaster: Failed to become active master org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-

avatar
Expert Contributor

Looks like problem with datanodes. Check whether all datanodes are up.

Once HDFS/Datanodes are healthy. Then you will be able to start HBase.

avatar

Yes...HDFS has reported missing blocks...NameNode Critical error : Total Blocks:[99], Missing Blocks:[7]

avatar
Master Mentor

@Victor Sarkar

This is the culprit "Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001"

How did you shutdown the cluster?

List the corrupt HDFS blocks:

$ hdfs fsck -list-corruptfileblocks 

Remove corrupt block with below method

$ hdfs fsck / -delete 

Determine problematic files with problems

$ hdfs fsck / | egrep -v '^\.+ 

Once you found the files that are corrupt delete them

$ hdfs fsck /path/to/corrupt/file -locations -blocks -files 

Repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

$ hdfs fs -rm /path/to/file/with/permanently/missing/blocks

HTH

avatar

I did follow the same step to check for the corrupted blocks which are causing the HBase Master start up error and then deleted the same. Once done restarted the HBase services successfully.