Created on 11-13-2017 10:04 PM - edited 09-16-2022 05:31 AM
I am unable to delete corrupt files present in my HDFS. Namenode has run into Safe mode. Total number of blocks are 980, out of which 978 have reported. When I run the following command,
sudo -u hdfs hdfs dfsadmin -report
The report generated is,
Safe mode is ON Configured Capacity: 58531520512 (54.51 GB) Present Capacity: 35774078976 (33.32 GB) DFS Remaining: 32374509568 (30.15 GB) DFS Used: 3399569408 (3.17 GB) DFS Used%: 9.50% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (1): Name: 10.0.2.15:50010 (quickstart.cloudera) Hostname: quickstart.cloudera Decommission Status : Normal Configured Capacity: 58531520512 (54.51 GB) DFS Used: 3399569408 (3.17 GB) Non DFS Used: 19777388544 (18.42 GB) DFS Remaining: 32374509568 (30.15 GB) DFS Used%: 5.81% DFS Remaining%: 55.31% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Tue Nov 14 10:39:58 IST 2017
And for the following command when executed,
sudo -u hdfs hdfs fsck /
The output is,
Connecting to namenode via http://quickstart.cloudera:50070/fsck?ugi=hdfs&path=%2F FSCK started by hdfs (auth:SIMPLE) from /10.0.2.15 for path / at Tue Nov 14 10:41:25 IST 2017 /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743141 /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.default.1509701903728: MISSING 1 blocks of total size 83 B.. /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: CORRUPT blockpool BP-1914853243-127.0.0.1-1500467607052 block blk_1073743142 /hbase/oldWALs/quickstart.cloudera%2C60020%2C1509698296866.meta.1509701932269.meta: MISSING 1 blocks of total size 83 B Status: CORRUPT Total size: 3368384392 B (Total open files size: 166 B) Total dirs: 286 Total files: 966 Total symlinks: 0 (Files currently being written: 3) Total blocks (validated): 980 (avg. block size 3437126 B) (Total open file blocks (not validated): 2) ******************************** UNDER MIN REPL'D BLOCKS: 2 (0.20408164 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 2 MISSING BLOCKS: 2 MISSING SIZE: 166 B CORRUPT BLOCKS: 2 ******************************** Minimally replicated blocks: 978 (99.79592 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.9979592 Corrupt blocks: 2 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Tue Nov 14 10:41:26 IST 2017 in 774 milliseconds The filesystem under path '/' is CORRUPT
Can anyone please help in either correcting the corrupted blocks, (or) deleting them? Thanks in advance.
Created 11-14-2017 01:11 AM
Hi,
Well for deleting corrupted blocks there is an option on the hdfs fsck command.
Add the option "-delete" and it should delete all corrupted (or missing) files.
You might need to leave safe mode for deleting the corrupted files.
If you want to "restore" them, then you shoulld try to follow these guidances :
https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files
Most cases of corrupted files cannot be restored.
regards,
Mathieu
Created 11-14-2017 01:11 AM
Hi,
Well for deleting corrupted blocks there is an option on the hdfs fsck command.
Add the option "-delete" and it should delete all corrupted (or missing) files.
You might need to leave safe mode for deleting the corrupted files.
If you want to "restore" them, then you shoulld try to follow these guidances :
https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files
Most cases of corrupted files cannot be restored.
regards,
Mathieu
Created 07-22-2019 07:05 AM
how to delete zero size directory from hdfs path.
thanks
HadoopHelp
Created 07-22-2019 09:10 AM
Assuming you want to delete a directory that has no contents, you can use:
hdfs dfs -rmdir /tmp/emptydir
Created 07-23-2019 03:37 AM
Thanks @bgooley .
but my situation is like below:-
i have dir like:-
/user/root/Data/hl71
/user/root/Data/hl72
/user/root/Data/hl73
/user/root/Data/hl74
/user/root/Data/hl75
/user/root/Data/hl76
/user/root/Data/hl77
/user/root/Data/hl78
/user/root/Data/hl79
..............................
..............................
but inside /user/root/Data/* multiple dir conatins zero size and as well data dir also.so as per your responce :-
(hdfs dfs -rmdir /tmp/emptydir) this will remove all dir from /user/root/Data/*
Note:-only want want delete zero size dir ,not want to delete data conatains dir.
so help with some other idea.
Thanks
HadoopHelp
Created 07-23-2019 09:44 AM
I think all directories are listed as 0 size. Do you mean you are looking to delete "empty" directories?
Created 07-26-2019 08:07 AM
Hi @bgooley ,
I want only delete zero size directory i.e empty directory but condition is here ?
*current directory contains some directory data, but some of the directory contains zero size directory in same directory.
I hope You understood now?
here @michalis has given idea that is matching with my requirements.
Thanks
HadoopHelp
Created on 07-24-2019 04:37 AM - edited 07-24-2019 01:22 PM
> Note:-only want want delete zero size dir ,not want to delete data conatains dir.
One of the idea involves two step process, to generate an empty directory listing and then take the listing and -rmdir
# generate the empty directory listing
$ hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.HdfsFindTool -find / -type d -empty # produces output
...
hdfs://ns1/user/impala
hdfs://ns1/user/spark/applicationHistory
hdfs://ns1/user/spark/spark2ApplicationHistory
hdfs://ns1/user/sqoop2
...
# OPTIONAL: pick a dir and confirm that the dir is empty eg:
$ hdfs dfs -du -s /user/impala
0 0 /user/impala
# remove the empty dir eg: /user/impala
# hdfs dfs -rmdir /user/impala
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/search_hdfsfindtool.html
Created 07-26-2019 08:01 AM
Thanks @michalis ,
Your Idea is correct but we need to add jars files as you added in previous ?
hadoop jar /opt/cloudera/parcels/CDH-*/jars/search-mr-*-job.jar org.apache.solr.hadoop.HdfsFindTool
Is there any option to find empty directory using HDFS command Directly?
I appreciated with your idea.
Thanks
HadoopHelp
Created on 07-26-2019 02:18 PM - edited 07-26-2019 02:19 PM
> Is there any option to find empty directory using HDFS command Directly?
You can get a list/find empty directories using the 'org.apache.solr.hadoop.HdfsFindTool'.
And using the hdfs tool to check/test if _a_ directory is empty, you can use -du or -test; please see the FileSystemShell [0]
test Usage: hadoop fs -test -[defsz] URI Options: -d: f the path is a directory, return 0. -e: if the path exists, return 0. -f: if the path is a file, return 0. -s: if the path is not empty, return 0. -r: if the path exists and read permission is granted, return 0. -w: if the path exists and write permission is granted, return 0. -z: if the file is zero length, return 0. Example: hadoop fs -test -e filename
du Usage: hadoop fs -du [-s] [-h] [-x] URI [URI ...] Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file. Options: The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path. The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864) The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path. The du returns three columns with the following format: size disk_space_consumed_with_all_replicas full_path_name Example: hadoop fs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1 Exit Code: Returns 0 on success and -1 on error.
[0] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html