Support Questions
Find answers, ask questions, and share your expertise

HBase as Computer Cluster

HBase as Computer Cluster

Expert Contributor

In the latest webinar for accelerate your time-to-insight with CDP Data Center it showcased the potential of compute clusters.

Is HBase also supported as a compute cluster i.e will it be able to use the shared DataContext of HDFS and have non-local region servers .. does that even work as a concept ?

Or would the idea be for HBase to have its own local hdfs because region servers are better of being data-local. The problem for scaling HBase is that the region servers and region capacities are tied to the datanodes but then it is possible to have a lot of storage on less data nodes which does not equate well for hbase profile datanodes.

The next question would probably then be for YARN could that be a separate compute cluster say for MR2 which uses the HBase Compute Cluster as a DataContext ?



Re: HBase as Computer Cluster

Cloudera Employee



HBase supports non-local as well as local HDFS independent of the VPC (Virtual Private Cluster) capability for storefiles.  HDFS is expected to be local for WAL files.  When using non-local HDFS, you will have greater latencies in HBase due to the loss of data locality. This is the same architecture that we support in the cloud where we use HDFS on instance storage for the WAL and S3 for object storage. Any compute engine can be hosted separately from HBase and still access resources on HBase.  There is no locality dependency for MR2 on HBase.

Re: HBase as Computer Cluster

Expert Contributor

@LakshmiR thanks.

You mention HDFS is expected to be local for WALs. So are you suggesting that there could be another non-local HDFS configured for storing the hfiles ?

So hfiles can be decoupled from the regions and essentially regionServers. If this is possible then one can potentially size a region server to serve the compute requests from a storage layer, the non-local hdfs and HBASE can then essentially boot up from scratch from the non-local hdfs and keep scaling independently.

This does introduce latency in-terms of the region servers serving requests from a non-local hdfs to the client but as long as the reads on hdfs hold and the network as well .. it may be fine.

Which version of CDH supports this .. is it starting from CDP ?