Support Questions

Find answers, ask questions, and share your expertise

Balance the skewed data on Data nodes and deal with under replicated blocks


I had a datanode failure due to the JAVA HEAP SIZE which caused a huge number of under replicated blocks since there were writes that happened when the node failed.I fixed the JAVA HEAP size and got the node alive.When I'm trying to re-replicate the blocks as mentioned here the number doesn't seem to comedown even when the setrep operation running.

Also, the other thing I observed was the data looks skewed on the datanodes:

  • 237.95 GB 2567977 222.77 GB (93.62%)
  • 775.82 GB 2657650 244.16 GB (31.47%)
  • 776.17 GB 2657711 244.17 GB (31.46%)

Is the skewed data interfering with the setrep operation?

IS there a way I can deal with the skew and the under replicated blocks?


@Swaapnika Guntaka Your problem is exactly why Balancer was created for 🙂

Follows this and this link to fix your problem.

Let know if that works for you.


@Rahul Soni I tried to run the balancer as

hdfs balancer -source <overloadedhost>

It ran in 3iterations saying it needs to transfer around 100gb and ended. There were no errors. But it didn’t fix the imbalance.

What HDP version you are using?