Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to verify if there is any orphaned/abandoned data on a Datanode ?

How to verify if there is any orphaned/abandoned data on a Datanode ?

Expert Contributor

Hi, 

 

How can i verify if there is any orphaned or abandoned data on a datanode ?

 

From the example below we see that /hadoop/sde is showing as 96% so before i do intra-Datanode balancer i wanted to verify that the data on this node is not actually orpohaned and the 96% is a cause of massive file deletion or addition of new DataNode disks. 

 


/dev/sde 1.1T 1.1T 50G 96% /hadoop/sde
/dev/sdn 1.1T 762G 357G 69% /hadoop/sdn

 

Steps that i performed. 

 

1. Searched for a random file on that node and ran hadoop fsck -blockId and found that there are only 3 replicas.

also looked at files older than 300 days and still see 3 live replicas. 

 

As there any other way to verify ?

 

 

 

1 REPLY 1
Highlighted

Re: How to verify if there is any orphaned/abandoned data on a Datanode ?

Contributor

One of the common reason for datanodes going unbalanced is ingestion/data load.

The first copy of data is always stored on the same datanode from where you are loading data into HDFS. Second and third copy of the data will be stored on rest of the data nodes based on a round robin fashion. You can make name node to choose availabe space on data nodes instead of round robin fashion by setting "DataNode Volume Choosing Policy" appropriately.

 

How many nodes do you have in this cluster?

How do you push data into HDFS?

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here