Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Can I run the balancer for hdfs?

avatar
Contributor

I use cloudera cdh 4.0.4.

I run balancing on Hbase.

However, I have 10 data nodes, and only 5 servers are being used as hbase region servers.

Data node imbalance has occurred.

Is there a possibility that Hbase will cause problems when balancing with Hadoop hdfs?

1 ACCEPTED SOLUTION

avatar
Mentor
There will not be any operational problems such as crashes or errors when
running a HDFS balancer on a cluster with HBase running, but there can
potentially be a performance impact depending on what the balancer decides
to move based on its space thresholds.

The performance impact would come from loss of locality - the
RegionServers' required HFiles may find their blocks to be remote, so a
slightly higher network usage can be observed until the next major
compaction rewrites a block replica locally.

If you'd like to narrow down the time-frame of impact, you can run the HDFS
balancer with the desired balancing threshold, and then once it is
complete, immediately follow up with a major compaction command on your
latency-sensitive HBase tables.

View solution in original post

1 REPLY 1

avatar
Mentor
There will not be any operational problems such as crashes or errors when
running a HDFS balancer on a cluster with HBase running, but there can
potentially be a performance impact depending on what the balancer decides
to move based on its space thresholds.

The performance impact would come from loss of locality - the
RegionServers' required HFiles may find their blocks to be remote, so a
slightly higher network usage can be observed until the next major
compaction rewrites a block replica locally.

If you'd like to narrow down the time-frame of impact, you can run the HDFS
balancer with the desired balancing threshold, and then once it is
complete, immediately follow up with a major compaction command on your
latency-sensitive HBase tables.