- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HortonWorks Cloudbreak default HDFS as Azure WASB tries rebalancing datanode data to HDFS (0% capacity) on decommissioning and fails
- Labels:
-
Apache Hadoop
-
Hortonworks Cloudbreak
Created ‎02-14-2018 11:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Configured Azure WASB storage as a default HDFS location through Cloudbreak, which had made Hadoop local HDFS capacity as 0 in Ambari (100 % utilized). I have default replication as 1 but now when I am trying to decommission a node, datanode tries to rebalance some 28KB of data to another available datanode. However, our HDFS has 0 capacity and therefore, decommissioning fails with below given error:
New node(s) could not be removed from the cluster. Reason Trying to move '28672' bytes worth of data to nodes with '0' bytes of capacity is not allowed
Getting the information on cluster shows that default local HDFS is still used for some KB space which is getting rebalanced whereas available capacity is 0:
"CapacityRemaining" : 0, "CapacityTotal" : 0, "CapacityUsed" : 131072, "DeadNodes" : "{}", "DecomNodes" : "{}", "HeapMemoryMax" : 1060372480, "HeapMemoryUsed" : 147668152, "NonDfsUsedSpace" : 0, "NonHeapMemoryMax" : -1, "NonHeapMemoryUsed" : 75319744, "PercentRemaining" : 0.0, "PercentUsed" : 100.0, "Safemode" : "", "StartTime" : 1518241019502, "TotalFiles" : 1, "UpgradeFinalized" : true,
There is an ambari decommissioned jar used by Cloudbreak to check, if HDFS is running out of space. Is there a way to change this jar?
if (remainingSpace < safetyUsedSpace) { throw new BadRequestException( String.format("Trying to move '%s' bytes worth of data to nodes with '%s' bytes of capacity is not allowed", usedSpace, remainingSpace) ); }
Reference link: https://github.com/hortonworks/cloudbreak/blob/1.16.4/core/src/main/java/com/sequenceiq/cloudbreak/s...
Created ‎02-14-2018 12:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.
Unfortunately, this is not an official setup supported by Hortonworks lately.
"When working with the cloud using cloud URIs do not change the value of fs.defaultFS
to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."
Therefore you might fix this by
- forking Cloudbreak code,
- modifying this part to meet your needs
- building your version of cloudbreak.jar
- copying it into cbreak_cloudbreak_1 container
- restarting the application.
You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.
Created ‎02-14-2018 11:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure what your plan is? If you decommission a data node avoiding that the rebalancing takes places could lead to data loss. For sure it will leave some file chunks without any redundant storage. So either you are able to delete some data on your HDFS to allow the rebalancing to suceed, or you add some capacity (i.e. with a new temporary node) to hdfs before decommissioning the data node.
Created ‎02-14-2018 12:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your quick response. I don't have any plan to keep the data on local HDFS and therefore, using Azure WASB for all storage (no worry of data loss). Redundancy will be covered by WASB and no plans to keep data on local HDFS. So, when I am using default storage as WASB then why I should add some capacity to local HDFS for decommissioning?
Created ‎02-14-2018 12:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This seems to be an issue with HDFS not calculating free spaces correctly when not used as defaultFS.
Unfortunately, this is not an official setup supported by Hortonworks lately.
"When working with the cloud using cloud URIs do not change the value of fs.defaultFS
to use a cloud storage connector as the filesystem for HDFS. This is not recommended or supported. Instead, when working with data stored in S3, ADLS, or WASB, use a fully qualified URL for that connector."
Therefore you might fix this by
- forking Cloudbreak code,
- modifying this part to meet your needs
- building your version of cloudbreak.jar
- copying it into cbreak_cloudbreak_1 container
- restarting the application.
You might consider opening an issue in Cloudbreak github repo, the team will investigate it deeper then.
Created ‎02-14-2018 01:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pdarvasi Thank you so much for answering and this is what I was looking for. Let me see how we can make these working.
Created ‎02-19-2018 05:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@pdarvasi Finally found the solution to this issue. Here are my findings:
When we started HDP using cloudbreak, HDP default configuration had calculated non-HDFS reserved storage "dfs.du.datanode.reserved" (approx 3.5 %) on total disk for the lowest storage configured for a datanode (among the compute config groups) which had three drives and one drive was in TBs. Our default configuration to store data on datanode "dfs.datanode.data.dir" was pointing to a drive with lowest capacity (around 3 % of overall DN storage). This 3 % < 3.5 % had made HDFS capacity as 0% and our existing datanode storage had some supporting directories and files in KBs which had resulted in marking negative KB capacity of the datanode. To fix the downscaling issue, either, we need to lower down non hdfs reserved capacity (lower than 3 %) or point our datanode to higher disk capacity (greater than 3.5 %)
I had tried this and it worked. No more changing WASB URI, therefore, keeping it as a default storage. However, I am thankful to you for making suggestions.
