Created 02-14-2018 11:41 AM
Configured Azure WASB storage as a default HDFS location through Cloudbreak, which had made Hadoop local HDFS capacity as 0 in Ambari (100 % utilized). I have default replication as 1 but now when I am trying to decommission a node, datanode tries to rebalance some 28KB of data to another available datanode. However, our HDFS has 0 capacity and therefore, decommissioning fails with below given error:
New node(s) could not be removed from the cluster. Reason Trying to move '28672' bytes worth of data to nodes with '0' bytes of capacity is not allowed
Getting the information on cluster shows that default local HDFS is still used for some KB space which is getting rebalanced whereas available capacity is 0:
"CapacityRemaining" : 0, "CapacityTotal" : 0, "CapacityUsed" : 131072, "DeadNodes" : "{}", "DecomNodes" : "{}", "HeapMemoryMax" : 1060372480, "HeapMemoryUsed" : 147668152, "NonDfsUsedSpace" : 0, "NonHeapMemoryMax" : -1, "NonHeapMemoryUsed" : 75319744, "PercentRemaining" : 0.0, "PercentUsed" : 100.0, "Safemode" : "", "StartTime" : 1518241019502, "TotalFiles" : 1, "UpgradeFinalized" : true,
There is an ambari decommissioned jar used by Cloudbreak to check, if HDFS is running out of space. Is there a way to change this jar?
if (remainingSpace < safetyUsedSpace) { throw new BadRequestException( String.format("Trying to move '%s' bytes worth of data to nodes with '%s' bytes of capacity is not allowed", usedSpace, remainingSpace) ); }
Reference link: https://github.com/hortonworks/cloudbreak/blob/1.16.4/core/src/main/java/com/sequenceiq/cloudbreak/s...
Created 02-14-2018 12:53 PM
This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.
Unfortunately, this is not an official setup supported by Hortonworks lately.
"When working with the cloud using cloud URIs do not change the value of fs.defaultFS
to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."
Therefore you might fix this by
You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.
Created 02-14-2018 11:57 AM
Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node.
Created 02-14-2018 12:25 PM
Thank you for your quick response. I don't have any plan to keep the data on local HDFS and therefore, using Azure WASB for all storage (no worry of data loss). Redundancy will be covered by WASB and no plans to keep data on local HDFS. So, when I am using default storage as WASB then why I should add some capacity to local HDFS for decommissioning?
Created 02-14-2018 12:53 PM
This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.
Unfortunately, this is not an official setup supported by Hortonworks lately.
"When working with the cloud using cloud URIs do not change the value of fs.defaultFS
to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."
Therefore you might fix this by
You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.
Created 02-14-2018 01:52 PM
@pdarvasi Thank you so much for answering and this is what I was looking for. Let me see how we can make these working.
Created 02-19-2018 05:30 AM
@pdarvasi Finally found the solution to this issue. Here are my findings:
When we started HDP using cloudbreak, HDP default configuration had calculated non-HDFS reserved storage "dfs.du.datanode.reserved" (approx 3.5 %) on total disk for the lowest storage configured for a datanode (among the compute config groups) which had three drives and one drive was in TBs. Our default configuration to store data on datanode "dfs.datanode.data.dir" was pointing to a drive with lowest capacity (around 3 % of overall DN storage). This 3 % < 3.5 % had made HDFS capacity as 0% and our existing datanode storage had some supporting directories and files in KBs which had resulted in marking negative KB capacity of the datanode. To fix the downscaling issue, either, we need to lower down non hdfs reserved capacity (lower than 3 %) or point our datanode to higher disk capacity (greater than 3.5 %)
I had tried this and it worked. No more changing WASB URI, therefore, keeping it as a default storage. However, I am thankful to you for making suggestions.