HBase FAQ: Sizing a Cluster

by Community Manager on ‎08-20-2015 01:14 PM

Is it OK for the HBase Master and Hadoop NameNode (+JobTracker) to run on the same server?

 

The NameNode needs memory. The HBase Master is normally not very busy. It just needs to be available when region servers check in, and for maintaining timely ZooKeeper heartbeats. As long as there is sufficient RAM on the combined NameNode + Master (+ JobTracker) such that the system never swaps, running both on the same server is OK.

You can consider running multiple HBase masters to remove one Single-Point-Of-Failure from the deployment. For a non-high-availability deployment it makes sense to run all on one server. We would recommend running HBase masters with the Namenode and Secondary/Standby node, this will give you the necessary redundancy.

 

Is it OK for HBase RegionServer and Hadoop DataNode (+ TaskTracker) to run on the same server?

 

Yes this is advised to ensure local data. Eventually, the data in HDFS which backs the region stores is brought local through background compaction. The MapReduce jobs that run against HBase after this happens access data locally as each split corresponds to a region and the task will be scheduled on the corresponding region server.

 

Is HBase RegionServer is a memory hungry process?

 

Yes. The more RAM you can give to the region servers, the better for performance:

Read caching (block cache) to avoid needing to hit the file system to serve frequently accessed data
Write caching (MemStore) to ride over flushes and compactions without blocking clients

 

Do I need dedicated boxes for each ZooKeeper?

 

It is advised to run the Zookeeper on dedicated hardware. If that is not an option, you can run Zookeeper with the Namenode, Job Tracker, and Standby node(Secondary Namenode). In a pinch you can co-locate ZooKeeper on DataNode/TaskTracker/RegionServer boxes, but it is not recommended. ZooKeeper does not take up a lot of resources on its own, but when starved for resources it can cause timeouts of Region Servers.

ZooKeeper is a 2N+1 fault tolerant system, so deploy 3 servers if you can stand to lose only one, or 5 if you want to be able to lose up to 2, and so on. There are diminishing returns after 7 or 9. Though this may seem like a lot of overhead just to run HBase, ZooKeeper provides value such as for providing synchronization primitives for your service or application, hosting dynamic configurations (and using watchers to get notice of changes), and managing presence and group membership.

 

What's the minimum cluster size?

 

For a non-high-availability system with local disk, we recommend three RegionServer-TaskTracker-DataNodes with additional servers for each HBase Master-NameNode-JobTracker and ZooKeeper for something minimally useful. Also remember to tune HDFS for such a small cluster: set the minimum replication to 1 or 2.

For a high-availability system, we recommend the same three RegionServer-TaskTracker-DataNodes with two additional servers for each HBase Master-NameNode-JobTracker and ZooKeeper.

 

 

NOTE: This article was taken from our internal Knowledge Base.  To access the original article please use the following link (customer login required):

 

HBase FAQ: Sizing a Cluster

 

 

Comments
by nageshkumarapp
on ‎06-10-2017 02:37 AM

Useful Questions and Answers 

by Electronics
on ‎09-04-2017 04:31 AM

nice information sir :) 

Contributors
Disclaimer: The information contained in this article was generated by third-parties and not by Cloudera or it's personnel. Cloudera cannot guarantee its accuracy or efficacy. Cloudera disclaims all warranties of any kind and users of this information assume all risk associated with it and with following the advice or directions contained herein. By visiting this page, you agree to be bound by the Terms and Conditions of Site Usage , including all disclaimers and limitations contained therein.