Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HortonWorks Cloudbreak default HDFS as Azure WASB tries rebalancing datanode data to HDFS (0% capacity) on decommissioning and fails

avatar

Configured Azure WASB storage as a default HDFS location through Cloudbreak, which had made Hadoop local HDFS capacity as 0 in Ambari (100 % utilized). I have default replication as 1 but now when I am trying to decommission a node, datanode tries to rebalance some 28KB of data to another available datanode. However, our HDFS has 0 capacity and therefore, decommissioning fails with below given error:

New node(s) could not be removed from the cluster. Reason Trying to move '28672' bytes worth of data to nodes with '0' bytes of capacity is not allowed

Getting the information on cluster shows that default local HDFS is still used for some KB space which is getting rebalanced whereas available capacity is 0:

"CapacityRemaining" : 0,
 "CapacityTotal" : 0,
 "CapacityUsed" : 131072,
 "DeadNodes" : "{}",
 "DecomNodes" : "{}",
 "HeapMemoryMax" : 1060372480,
 "HeapMemoryUsed" : 147668152,
 "NonDfsUsedSpace" : 0,
 "NonHeapMemoryMax" : -1,
 "NonHeapMemoryUsed" : 75319744,
 "PercentRemaining" : 0.0,
 "PercentUsed" : 100.0,
 "Safemode" : "",
 "StartTime" : 1518241019502,
 "TotalFiles" : 1,
 "UpgradeFinalized" : true,

There is an ambari decommissioned jar used by Cloudbreak to check, if HDFS is running out of space. Is there a way to change this jar?

if (remainingSpace < safetyUsedSpace) {
    throw new BadRequestException(
            String.format("Trying to move '%s' bytes worth of data to nodes with '%s' bytes of capacity is not allowed", usedSpace, remainingSpace)
    );
}

Reference link: https://github.com/hortonworks/cloudbreak/blob/1.16.4/core/src/main/java/com/sequenceiq/cloudbreak/s...

1 ACCEPTED SOLUTION

avatar

@Abhishek Sakhuja

This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.

Unfortunately, this is not an official setup supported by Hortonworks lately.

"When working with the cloud using cloud URIs do not change the value of fs.defaultFS to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."

Therefore you might fix this by

  1. forking Cloudbreak code,
  2. modifying this part to meet your needs
  3. building your version of cloudbreak.jar
  4. copying it into cbreak_cloudbreak_1 container
  5. restarting the application.

You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node.

avatar

Thank you for your quick response. I don't have any plan to keep the data on local HDFS and therefore, using Azure WASB for all storage (no worry of data loss). Redundancy will be covered by WASB and no plans to keep data on local HDFS. So, when I am using default storage as WASB then why I should add some capacity to local HDFS for decommissioning?

avatar

@Abhishek Sakhuja

This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.

Unfortunately, this is not an official setup supported by Hortonworks lately.

"When working with the cloud using cloud URIs do not change the value of fs.defaultFS to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."

Therefore you might fix this by

  1. forking Cloudbreak code,
  2. modifying this part to meet your needs
  3. building your version of cloudbreak.jar
  4. copying it into cbreak_cloudbreak_1 container
  5. restarting the application.

You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.

avatar

@pdarvasi Thank you so much for answering and this is what I was looking for. Let me see how we can make these working.

avatar

@pdarvasi Finally found the solution to this issue. Here are my findings:

When we started HDP using cloudbreak, HDP default configuration had calculated non-HDFS reserved storage "dfs.du.datanode.reserved" (approx 3.5 %) on total disk for the lowest storage configured for a datanode (among the compute config groups) which had three drives and one drive was in TBs. Our default configuration to store data on datanode "dfs.datanode.data.dir" was pointing to a drive with lowest capacity (around 3 % of overall DN storage). This 3 % < 3.5 % had made HDFS capacity as 0% and our existing datanode storage had some supporting directories and files in KBs which had resulted in marking negative KB capacity of the datanode. To fix the downscaling issue, either, we need to lower down non hdfs reserved capacity (lower than 3 %) or point our datanode to higher disk capacity (greater than 3.5 %)

I had tried this and it worked. No more changing WASB URI, therefore, keeping it as a default storage. However, I am thankful to you for making suggestions.