Created 06-21-2018 07:54 AM
I've set up my HCP cluster with 4 nodes. But I observed that my HBase Master never starts up after cluster restart. If I leave the cluster up and running from the installation time, then HBase works fine. But once I shut down the cluster and the try to start again Master fails to start.
When I looked into the master log I found out that, during shut down time there was the following warning entries:
------------------------------------------------------------------------------------------
2018-06-14 14:48:14,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
2018-06-14 14:48:15,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available.
2018-06-14 14:48:15,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
2018-06-14 14:48:16,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available.
2018-06-14 14:48:16,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}
2018-06-14 14:48:17,099 WARN [MASTER_META_SERVER_OPERATIONS-vmbdsiwbdn2:16000-3] master.AssignmentManager: Can't move 1588230740, there is no destination server available.
----------------------------------------------------------------------------------------------
And on Cluster restart, I found the following warning entries in the master log file:
----------------------------------------------------------------------------------------------
2018-06-15 14:12:41,118 INFO [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: No node available for BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 2018-06-15 14:12:41,118 INFO [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 from any node: java.io.IOException: No live nodes contain block BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 after checking nodes = [], ignoredNodes = null No live nodes contain current block Block locations: Dead nodes: . Will get new block locations from namenode and retry... 2018-06-15 14:12:41,119 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 11833.399457334988 msec. 2018-06-15 14:12:52,955 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException 2018-06-15 14:12:52,955 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001 No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException 2018-06-15 14:12:52,956 WARN [vmbdsiwbdn2:16000.activeMasterManager] hdfs.DFSClient: DFS Read org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001
2018-06-15 14:12:52,958 FATAL [vmbdsiwbdn2:16000.activeMasterManager] master.HMaster: Failed to become active master org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001
----------------------------------------------------------------------------------------------
Please suggest how to overcome this scenario.
Created 06-23-2018 10:55 AM
This is the culprit "Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001"
How did you shutdown the cluster?
List the corrupt HDFS blocks:
$ hdfs fsck -list-corruptfileblocks
Remove corrupt block with below method
$ hdfs fsck / -delete
Determine problematic files with problems
$ hdfs fsck / | egrep -v '^\.+
Once you found the files that are corrupt delete them
$ hdfs fsck /path/to/corrupt/file -locations -blocks -files
Repeat until all files are healthy or you exhaust all alternatives looking for the blocks.
Once you determine what happened and you cannot recover any more blocks, just use the
$ hdfs fs -rm /path/to/file/with/permanently/missing/blocks
HTH
Created 06-22-2018 02:45 PM
Can you check whether HDFS is healthy. Do you see any missing blocks in namenode?
2018-06-15 14:12:52,958 FATAL [vmbdsiwbdn2:16000.activeMasterManager] master.HMaster: Failed to become active master org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1307428289-10.0.0.4-
Created 06-23-2018 07:27 AM
Looks like problem with datanodes. Check whether all datanodes are up.
Once HDFS/Datanodes are healthy. Then you will be able to start HBase.
Created 06-23-2018 07:29 AM
Yes...HDFS has reported missing blocks...NameNode Critical error : Total Blocks:[99], Missing Blocks:[7]
Created 06-23-2018 10:55 AM
This is the culprit "Could not obtain block: BP-1307428289-10.0.0.4-1528240888625:blk_1073741830_1006 file=/apps/hbase/data/data/hbase/meta/.tabledesc/.tableinfo.0000000001"
How did you shutdown the cluster?
List the corrupt HDFS blocks:
$ hdfs fsck -list-corruptfileblocks
Remove corrupt block with below method
$ hdfs fsck / -delete
Determine problematic files with problems
$ hdfs fsck / | egrep -v '^\.+
Once you found the files that are corrupt delete them
$ hdfs fsck /path/to/corrupt/file -locations -blocks -files
Repeat until all files are healthy or you exhaust all alternatives looking for the blocks.
Once you determine what happened and you cannot recover any more blocks, just use the
$ hdfs fs -rm /path/to/file/with/permanently/missing/blocks
HTH
Created 06-24-2018 02:16 PM
I did follow the same step to check for the corrupted blocks which are causing the HBase Master start up error and then deleted the same. Once done restarted the HBase services successfully.