Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Balance the skewed data on Data nodes and deal with under replicated blocks

Highlighted

Balance the skewed data on Data nodes and deal with under replicated blocks

Explorer

I had a datanode failure due to the JAVA HEAP SIZE which caused a huge number of under replicated blocks since there were writes that happened when the node failed.I fixed the JAVA HEAP size and got the node alive.When I'm trying to re-replicate the blocks as mentioned here the number doesn't seem to comedown even when the setrep operation running.

Also, the other thing I observed was the data looks skewed on the datanodes:

  • 237.95 GB 2567977 222.77 GB (93.62%) 2.7.3.2.6.4.0-91
  • 775.82 GB 2657650 244.16 GB (31.47%) 2.7.3.2.6.4.0-91
  • 776.17 GB 2657711 244.17 GB (31.46%) 2.7.3.2.6.4.0-91

Is the skewed data interfering with the setrep operation?

IS there a way I can deal with the skew and the under replicated blocks?

3 REPLIES 3
Highlighted

Re: Balance the skewed data on Data nodes and deal with under replicated blocks

@Swaapnika Guntaka Your problem is exactly why Balancer was created for :)

Follows this and this link to fix your problem.

Let know if that works for you.

Highlighted

Re: Balance the skewed data on Data nodes and deal with under replicated blocks

Explorer

@Rahul Soni I tried to run the balancer as

hdfs balancer -source <overloadedhost>

It ran in 3iterations saying it needs to transfer around 100gb and ended. There were no errors. But it didn’t fix the imbalance.

Highlighted

Re: Balance the skewed data on Data nodes and deal with under replicated blocks

What HDP version you are using?

Don't have an account?
Coming from Hortonworks? Activate your account here