Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to delete HDFS Corrupt files

avatar
Contributor

I am unable to delete corrupt files present in my HDFS. Namenode has run into Safe mode. Total number of blocks are 980, out of which 978 have reported. When I run the following command,

sudo -u hdfs hdfs dfsadmin -report

The report generated is,

Safe mode is ON
Configured Capacity: 58531520512 (54.51 GB)
Present Capacity: 35774078976 (33.32 GB)
DFS Remaining: 32374509568 (30.15 GB)
DFS Used: 3399569408 (3.17 GB)
DFS Used%: 9.50%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):

Name: 10.0.2.15:50010 (quickstart.cloudera)
Hostname: quickstart.cloudera
Decommission Status : Normal
Configured Capacity: 58531520512 (54.51 GB)
DFS Used: 3399569408 (3.17 GB)
Non DFS Used: 19777388544 (18.42 GB)
DFS Remaining: 32374509568 (30.15 GB)
DFS Used%: 5.81%
DFS Remaining%: 55.31%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Nov 14 10:39:58 IST 2017

And for the following command when executed,

sudo -u hdfs hdfs fsck /

The output is,

Connecting to namenode via http://quickstart.cloudera:50070/fsck?ugi=hdfs&path=%2F
FSCK started by hdfs (auth:SIMPLE) from /10.0.2.15 for path / at Tue Nov 14 10:41:25 IST 2017
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743141

/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: MISSING 1 blocks of total size 83 B..
/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743142

/hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: MISSING 1 blocks of total size 83 B
Status: CORRUPT
Total size: 3368384392 B (Total open files size: 166 B)
Total dirs: 286
Total files:    966
Total symlinks:     0 (Files currently being written: 3)
Total blocks (validated):   980 (avg. block size 3437126 B) (Total open file blocks (not validated): 2)
********************************
UNDER MIN REPL'D BLOCKS:    2 (0.20408164 %)
dfs.namenode.replication.min:   1
CORRUPT FILES:  2
MISSING BLOCKS: 2
MISSING SIZE:       166 B
CORRUPT BLOCKS:     2
********************************
Minimally replicated blocks:    978 (99.79592 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks:    0 (0.0 %)
Mis-replicated blocks:      0 (0.0 %)
Default replication factor: 1
Average block replication:  0.9979592
Corrupt blocks:     2
Missing replicas:       0 (0.0 %)
Number of data-nodes:       1
Number of racks:        1
FSCK ended at Tue Nov 14 10:41:26 IST 2017 in 774 milliseconds
The filesystem under path '/' is CORRUPT

Can anyone please help in either correcting the corrupted blocks, (or) deleting them? Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

 

Well for deleting corrupted blocks there is an option on the hdfs fsck command.

Add the option "-delete" and it should delete all corrupted (or missing) files.

 

You might need to leave safe mode for deleting the corrupted files.

 

If you want to "restore" them, then you shoulld try to follow these guidances :

https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files

 

Most cases of corrupted files cannot be restored.

 

regards,

Mathieu

View solution in original post

9 REPLIES 9

avatar
Super Collaborator

Hi,

 

Well for deleting corrupted blocks there is an option on the hdfs fsck command.

Add the option "-delete" and it should delete all corrupted (or missing) files.

 

You might need to leave safe mode for deleting the corrupted files.

 

If you want to "restore" them, then you shoulld try to follow these guidances :

https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files

 

Most cases of corrupted files cannot be restored.

 

regards,

Mathieu

avatar
Contributor

how to delete zero size directory from hdfs path.

 

 

 

 

thanks

HadoopHelp

avatar
Master Guru

@HadoopHelp,

 

Assuming you want to delete a directory that has no contents, you can use:

 

hdfs dfs -rmdir /tmp/emptydir

avatar
Contributor

Thanks @bgooley .

 

but my situation is like below:-

i have dir like:-

/user/root/Data/hl71

/user/root/Data/hl72

/user/root/Data/hl73

/user/root/Data/hl74

/user/root/Data/hl75

/user/root/Data/hl76

/user/root/Data/hl77

/user/root/Data/hl78

/user/root/Data/hl79

..............................

..............................

but inside /user/root/Data/* multiple dir conatins zero size and as well data dir also.so as per your responce :-

(hdfs dfs -rmdir /tmp/emptydir) this will remove all dir from  /user/root/Data/*

 

 

Note:-only want want delete zero size dir ,not want to delete data conatains dir.

 

 

 

so help with some other idea.

 

 

 

Thanks

HadoopHelp

avatar
Master Guru

@HadoopHelp,

 

I think all directories are listed as 0 size.  Do you mean you are looking to delete "empty" directories?

avatar
Contributor

Hi @bgooley ,

 

I want only delete zero size directory i.e empty directory but condition is here ?

 

*current directory contains some directory data, but some of the directory contains zero size directory in same directory. 

 

I hope You understood now?

 

 

here @michalis has given idea that is matching with my requirements.

 

 

Thanks

HadoopHelp 

 

 

avatar
Master Collaborator

> Note:-only want want delete zero size dir ,not want to delete data conatains dir.

 

One of the idea involves two step process, to generate an empty directory listing and then take the listing and -rmdir

 

# generate the empty directory listing
$ hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.HdfsFindTool -find / -type d -empty # produces output
...
hdfs://ns1/user/impala
hdfs://ns1/user/spark/applicationHistory
hdfs://ns1/user/spark/spark2ApplicationHistory
hdfs://ns1/user/sqoop2
...

# OPTIONAL: pick a dir and confirm that the dir is empty eg:
$ hdfs dfs -du -s /user/impala
0 0 /user/impala

# remove the empty dir eg: /user/impala
# hdfs dfs -rmdir /user/impala
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/search_hdfsfindtool.html

 

avatar
Contributor

Thanks @michalis ,

 

Your Idea is correct but we need to add jars files as you added in previous ?

hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.HdfsFindTool

 

Is there any option to find empty directory using HDFS command Directly?

 

 

I appreciated with your idea.

 

 

 

Thanks

HadoopHelp

 

avatar
Master Collaborator

> Is there any option to find empty directory using HDFS command Directly?

You can get a list/find empty directories using the 'org.apache.solr.hadoop.HdfsFindTool'.

And using the hdfs tool to check/test if _a_ directory is empty, you can use -du or -test; please see the FileSystemShell [0]

 

test

Usage: hadoop fs -test -[defsz] URI

Options:

    -d: f the path is a directory, return 0.
    -e: if the path exists, return 0.
    -f: if the path is a file, return 0.
    -s: if the path is not empty, return 0.
    -r: if the path exists and read permission is granted, return 0.
    -w: if the path exists and write permission is granted, return 0.
    -z: if the file is zero length, return 0.

Example:

    hadoop fs -test -e filename
du

Usage: hadoop fs -du [-s] [-h] [-x] URI [URI ...]

Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.

Options:

    The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path.
    The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864)
    The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

The du returns three columns with the following format:

size disk_space_consumed_with_all_replicas full_path_name

Example:

    hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1

Exit Code: Returns 0 on success and -1 on error.

 

[0] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html