Support Questions

Find answers, ask questions, and share your expertise

Can I run the balancer for hdfs?

avatar
Contributor

I use cloudera cdh 4.0.4.

I run balancing on Hbase.

However, I have 10 data nodes, and only 5 servers are being used as hbase region servers.

Data node imbalance has occurred.

Is there a possibility that Hbase will cause problems when balancing with Hadoop hdfs?

1 ACCEPTED SOLUTION

avatar
Mentor
There will not be any operational problems such as crashes or errors when
running a HDFS balancer on a cluster with HBase running, but there can
potentially be a performance impact depending on what the balancer decides
to move based on its space thresholds.

The performance impact would come from loss of locality - the
RegionServers' required HFiles may find their blocks to be remote, so a
slightly higher network usage can be observed until the next major
compaction rewrites a block replica locally.

If you'd like to narrow down the time-frame of impact, you can run the HDFS
balancer with the desired balancing threshold, and then once it is
complete, immediately follow up with a major compaction command on your
latency-sensitive HBase tables.

View solution in original post

1 REPLY 1

avatar
Mentor
There will not be any operational problems such as crashes or errors when
running a HDFS balancer on a cluster with HBase running, but there can
potentially be a performance impact depending on what the balancer decides
to move based on its space thresholds.

The performance impact would come from loss of locality - the
RegionServers' required HFiles may find their blocks to be remote, so a
slightly higher network usage can be observed until the next major
compaction rewrites a block replica locally.

If you'd like to narrow down the time-frame of impact, you can run the HDFS
balancer with the desired balancing threshold, and then once it is
complete, immediately follow up with a major compaction command on your
latency-sensitive HBase tables.