Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Cluster Setup

avatar

Hi,

I have a couple of questions while setting up a standalone HBase cluster.

1. Can I install and configure only Zookeeper, HBase services without installing HDFS, Yarn etc.,?

2. If 'Yes' to above questions, what are the pros and cons of installing HBase with and without Namenode, Resource Manager?

3. Can anyone share "best practices" for HBase cluster?

Any information on HBase is highly appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Guru
@SBandaru

Please see my replies inline below:

1. Can I install and configure only Zookeeper, HBase services without installing HDFS, Yarn etc.,?

You can do without YARN but not without HDFS. HDFS is where HBase stores data. But then you are not running any spark or map reduce jobs on HBase. Pretty much nothing except your HBase API to access data.

2. If 'Yes' to above questions, what are the pros and cons of installing HBase with and without Namenode, Resource Manager?

You cannot do it without namenode. That's a must for any Hadoop cluster. I can't think of any pros of not having YARN. It doesn't take a lot of resources or space by itself and is absolutely required to run anything on top of HBase, like Hive, Spark, MapReduce and so on. There are bunch cons of not doing it but may be the only pro is you have a much simpler environment without having any additional project than those required at a minimum.

3. Can anyone share "best practices" for HBase cluster?

What are your application requirements. Depending on whether you want to optimize for read or writes, there are different ways to go about setting up. One thing that remains consistent across use cases is a good key design. Cannot over emphasize this.

View solution in original post

3 REPLIES 3

avatar
Super Guru

1. Yes, HBase only requires HDFS and ZooKeeper to function. YARN is not required.

2. This question is nonsensical. The Namenode is not an optional component of HDFS.

3. Use Ambari to install the cluster

Please read the official HBase book to better understand the architecture. These basics are covered there.

http://hbase.apache.org/book.html

avatar
Super Guru
@SBandaru

Please see my replies inline below:

1. Can I install and configure only Zookeeper, HBase services without installing HDFS, Yarn etc.,?

You can do without YARN but not without HDFS. HDFS is where HBase stores data. But then you are not running any spark or map reduce jobs on HBase. Pretty much nothing except your HBase API to access data.

2. If 'Yes' to above questions, what are the pros and cons of installing HBase with and without Namenode, Resource Manager?

You cannot do it without namenode. That's a must for any Hadoop cluster. I can't think of any pros of not having YARN. It doesn't take a lot of resources or space by itself and is absolutely required to run anything on top of HBase, like Hive, Spark, MapReduce and so on. There are bunch cons of not doing it but may be the only pro is you have a much simpler environment without having any additional project than those required at a minimum.

3. Can anyone share "best practices" for HBase cluster?

What are your application requirements. Depending on whether you want to optimize for read or writes, there are different ways to go about setting up. One thing that remains consistent across use cases is a good key design. Cannot over emphasize this.

avatar
Contributor

In addition to the above, we can have HBase on S3 instead of HDFS, but for that we must use emrfs implementation. Keeping it simple, use EMR 5.2 and greater versions. But, still Namenode is mandatory.