Member since
10-03-2017
17
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9526 | 11-09-2018 02:26 AM |
11-09-2018
04:15 AM
Hello Vinod, Please refer to the previously mentioned hbck guide[1] or review the Appendix C[2] which is referenced at the end of that documentation that further discusses usage of hbck. Generally near every hbck troubleshooting is best begin with a single $ sudo -u hbase hbase hbck -fixAssignments As this will try to assign all regions that are not deployed at the time of the running. Holes can be present by many different reasons, the first step is reviewing if every region is assigned successfully, would the hole persist. Reassigning regions successfully usually eliminates the holes in the region chain. It's also a good practice to see if the Apache HBase "thebook" has any information about the issue at hand. As CDH5.8+ uses HBase 1.2 it's best to check out the corresponding version of the Apache Documentation on HBase[3]. If you would have CDH6.0.x then it's best to review HBase 2.0's documentation of the same[4] which has hbck2. [1] - Checking and Repiring HBase tables CDH5.15.x - https://www.cloudera.com/documentation/enterprise/5-15-x/topics/admin_hbase_hbck.html [2] - Apache HBase documentation v1.2 / Appendix C - http://hbase.apache.org/1.2/book.html#hbck.in.depth [3] - Apache HBase documentation v1.2 / HBase hbck - http://hbase.apache.org/1.2/book.html#hbck [4] - Apache HBase documentation / HBase HBCK2 - http://hbase.apache.org/book.html#HBCK2
... View more
11-09-2018
03:57 AM
Additionally to the previous solution some best practices: - hbck is basically just an HBase client command - client commands are recommended to being run from nodes which has the relevant service's client configurations deployed on them. This can be done manually (not recommended, see later why) or via Cloudera Manager According to these whichever node you are running hbck should have HBase client configs deployed to make sure that it actually uses the cluster's current configs (which have several configs, like heap size for client commands, Zookeeper ensemble hostnames, etc). To have this done, it's recommended to deploy an HBase GATEWAY role[1] that actually does just this, deploys the active configs of HBase service via Cloudera Manager. Additionally if any HBase client config changes are made later via Cloudera Manager, those will be also delegated automatically just the same way as any config changes are delegated to every node which has HBase role instances installed on. There are some further reference about using hbck here[2] as this is an advanced topic. [1] - Gateway roles CDH latest version - https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_managing_roles.html#managing_roles__section_scv_ywt_cn [2] - Checking and Repairing HBase tables CDH5.15.x - https://www.cloudera.com/documentation/enterprise/5-15-x/topics/admin_hbase_hbck.html (please note that in CDH6.0.0 hbck's several options are depreciated)
... View more
11-09-2018
02:26 AM
It worth to check if the use case is actually suited for using HDFS's NFS Gateway role[1] which is designed for such remote cluster access.
[1] - Adding and Configuring an NFS Gateway - https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_hdfs_nfsgateway.html
... View more
10-23-2018
02:08 AM
Resurrecting this topic with some clarity on the issue and it's remedy. If RegionServers would be keeping dead connections to the Datanodes, the same symnptoms would be seen, many connection in CLOSE_WAIT, and file descriptor number increasing. In extreme cases the limit could be reached, whioch would case the host node to fail with no more open file descriptors to use issue. There wasa bug in HBase prior to CDH5.13 which is described in this upstream JIRA in more detail[1]: HBASE-9393 Hbase does not closing a closed socket resulting in many CLOSE_WAIT] This issue was patched in the following CDH releases: CDH5.13.0, CDH5.13.1, CDH5.13.2, CDH5.13.3, CDH5.14.0, CDH5.14.2, CDH5.14.4, CDH5.15.0, CDH5.15.1, CDH6.0.0. [1] - upstream HBase JIRA - https://issues.apache.org/jira/browse/HBASE-9393?attachmentOrder=asc
... View more
08-28-2018
03:34 AM
I am just sharing the relevant part of the linked docs, as they contain the instructions on how to enable the hbase balancer via hbase shell: Load Balancer It is assumed that the Region Load Balancer is disabled while the graceful_stop script runs (otherwise the balancer and the decommission script will end up fighting over region deployments). Use the shell to disable the balancer: hbase(main):001:0> balance_switch false
true
0 row(s) in 0.3590 seconds This turns the balancer OFF. To reenable, do: hbase(main):001:0> balance_switch true
false
0 row(s) in 0.3590 seconds The graceful_stop will check the balancer and if enabled, will turn it off before it goes to work. If it exits prematurely because of error, it will not have reset the balancer. Hence, it is better to manage the balancer apart from graceful_stopreenabling it after you are done w/ graceful_stop.
... View more