How can i verify if there is any orphaned or abandoned data on a datanode ?
From the example below we see that /hadoop/sde is showing as 96% so before i do intra-Datanode balancer i wanted to verify that the data on this node is not actually orpohaned and the 96% is a cause of massive file deletion or addition of new DataNode disks.
One of the common reason for datanodes going unbalanced is ingestion/data load.
The first copy of data is always stored on the same datanode from where you are loading data into HDFS. Second and third copy of the data will be stored on rest of the data nodes based on a round robin fashion. You can make name node to choose availabe space on data nodes instead of round robin fashion by setting "DataNode Volume Choosing Policy" appropriately.