Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS Corrupt Blocks -- NameNode stays in Safe Mode

Highlighted

HDFS Corrupt Blocks -- NameNode stays in Safe Mode

Rising Star

After adding 2 DataNodes into an existing CDH 5.4 cluster, HDFS has become currupted.

There are quite a few (over 2,000) corrupted blocks!

1) How do I repair them?

2) IF I cannot repair/restore can I get HDFS back to be accessible (leave Safe Mode)??

 

14 REPLIES 14

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Please post here, the output of "sudo -u hdfs hdfs dfsadmin -report" as
well as the summary section of the "sudo -u hdfs hdfs fsck /"

Regards,
Gautam Gopalakrishnan
Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

Here is the first command:

<><><><><><><><><><><><><><><><><><><><><>

[root@ ~]# sudo -u hdfs hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 6807953326080 (6.19 TB)
Present Capacity: 5076746797056 (4.62 TB)
DFS Remaining: 5076745936896 (4.62 TB)
DFS Used: 860160 (840 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (5):

Name: 10.15.230.42:50010 (node2)
Hostname: node2
Rack: /default
Decommission Status : Normal
Configured Capacity: 1361590665216 (1.24 TB)
DFS Used: 172032 (168 KB)
Non DFS Used: 425847939072 (396.60 GB)
DFS Remaining: 935742554112 (871.48 GB)
DFS Used%: 0.00%
DFS Remaining%: 68.72%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu May 07 21:52:10 EDT 2015


Name: 10.15.230.44:50010 (node4)
Hostname: node4
Rack: /default
Decommission Status : Normal
Configured Capacity: 1361590665216 (1.24 TB)
DFS Used: 172032 (168 KB)
Non DFS Used: 219371347968 (204.31 GB)
DFS Remaining: 1142219145216 (1.04 TB)
DFS Used%: 0.00%
DFS Remaining%: 83.89%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu May 07 21:52:10 EDT 2015


Name: 10.15.230.45:50010 (node5)
Hostname: node5
Rack: /default
Decommission Status : Normal
Configured Capacity: 1361590665216 (1.24 TB)
DFS Used: 172032 (168 KB)
Non DFS Used: 218613866496 (203.60 GB)
DFS Remaining: 1142976626688 (1.04 TB)
DFS Used%: 0.00%
DFS Remaining%: 83.94%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu May 07 21:52:10 EDT 2015


Name: 10.15.230.41:50010 (node1)
Hostname: node1
Rack: /default
Decommission Status : Normal
Configured Capacity: 1361590665216 (1.24 TB)
DFS Used: 172032 (168 KB)
Non DFS Used: 647755825152 (603.27 GB)
DFS Remaining: 713834668032 (664.81 GB)
DFS Used%: 0.00%
DFS Remaining%: 52.43%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu May 07 21:52:10 EDT 2015


Name: 10.15.230.43:50010 (node3)
Hostname: node3
Rack: /default
Decommission Status : Normal
Configured Capacity: 1361590665216 (1.24 TB)
DFS Used: 172032 (168 KB)
Non DFS Used: 219617550336 (204.53 GB)
DFS Remaining: 1141972942848 (1.04 TB)
DFS Used%: 0.00%
DFS Remaining%: 83.87%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 2
Last contact: Thu May 07 21:52:08 EDT 2015

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

From the second command I am getting all 2,000+ files with the corrupted block!

I am listing a ccouple:

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Connecting to namenode via http://master:50070
FSCK started by hdfs (auth:SIMPLE) from /10.15.230.22 for path / at Thu May 07 21:56:17 EDT 2015
..
/accumulo/tables/!0/table_info/A00009pl.rf: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073806971

/accumulo/tables/!0/table_info/A00009pl.rf: MISSING 1 blocks of total size 891 B..
/accumulo/tables/!0/table_info/A00009pm.rf: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073806989

/accumulo/tables/!0/table_info/A00009pm.rf: MISSING 1 blocks of total size 891 B..
/accumulo/tables/!0/table_info/F00009pn.rf: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073807006

............................................

/user/oozie/share/lib/lib_20150408141046/sqoop/geronimo-jaspic_1.0_spec-1.0.jar: MISSING 1 blocks of total size 30548 B..
/user/oozie/share/lib/lib_20150408141046/sqoop/geronimo-jta_1.1_spec-1.1.1.jar: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073743585

/user/oozie/share/lib/lib_20150408141046/sqoop/geronimo-jta_1.1_spec-1.1.1.jar: MISSING 1 blocks of total size 16030 B..
/user/oozie/share/lib/lib_20150408141046/sqoop/groovy-all-2.1.6.jar: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073743592

............................................

/tmp/logs/fincalc/logs/application_1430405794825_0003/DAST-node5_8041: MISSING 9 blocks of total size 1117083218 B..
/tmp/logs/fincalc/logs/application_1430405794825_0004/DAST-node1_8041: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073800217

/tmp/logs/fincalc/logs/application_1430405794825_0004/DAST-node1_8041: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073800222

/tmp/logs/fincalc/logs/application_1430405794825_0004/DAST-node1_8041: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073800230

............................................

............................................

/user/spark/applicationHistory/application_1430931186993_0007.inprogress: MISSING 1 blocks of total size 124212 B..
/user/spark/applicationHistory/application_1430931186993_0008: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073804097

/user/spark/applicationHistory/application_1430931186993_0008: MISSING 1 blocks of total size 124056 B..
/user/spark/applicationHistory/test.txt: CORRUPT blockpool BP-2034730372-10.15.230.22-1428441473000 block blk_1073745652

/user/spark/applicationHistory/test.txt: MISSING 1 blocks of total size 16 B.Status: CORRUPT
 Total size:    154126314807 B (Total open files size: 186 B)
 Total dirs:    3350
 Total files:   1790
 Total symlinks:                0 (Files currently being written: 2)
 Total blocks (validated):      2776 (avg. block size 55521006 B) (Total open file blocks (not validated): 2)
  ********************************
  CORRUPT FILES:        1764
  MISSING BLOCKS:       2776
  MISSING SIZE:         154126314807 B
  CORRUPT BLOCKS:       2776
  ********************************
 Minimally replicated blocks:   0 (0.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     0.0
 Corrupt blocks:                2776
 Missing replicas:              0
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Thu May 07 21:56:18 EDT 2015 in 516 milliseconds


The filesystem under path '/' is CORRUPT

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

Needless to say, the NameNode remains in Safe Mode!

I have tried to 'recover' HDFS by using: hadoop namenode -recover

 

but having no success!

 

How can fix those 2,000+ corrupt blocks or in the worst case scenario get rid of them!!

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

The 2 new DataNodes that I added earlier I have stopped them, that's why you see a total of 5 DNs.

However, when I tried to 'decomm' them I had no success.

The process run for more than 2 hours, then I had to kill it.

Thus far, I am having:

1 NN

1 SNN

7 DNs (5 DNs are up, 2 DNs are stopped).

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

If someone could suggest a way to repair the blocks will be gratly appreciated!

I need to have this CDH cluster available tomorrow morning (Fri. morning)!!!

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

New Contributor

I think you got hit with https://issues.apache.org/jira/browse/HDFS-7281

(missing block marked as corrupted block)

 

For the file that is missing, do:

hdfs dfs -ls /accumulo/tables/!0/table_info/

See what is the replication factor which is shown on the 2nd column of the output above.

If the replication factor > 3, then you should have a block somewhere.

 

Get the list of some of the missing block, then on your data node, do

find /<path_to_the_data_directory> -type f | grep <missing block>

Eg:

find /<path_to_data_directory> -type f | grep 'BP-2034730372-10.15.230.22-1428441473000'

 

See if the block is there or not.

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

Thank you for your suggestions!

I am in the process of removing some files that I don't need!

Stay tuned...

 

Highlighted

Re: HDFS Corrupt Block -- NameNode stays in Safe Mode

Rising Star

I am trying to remove files/directories but I cannot...

[root@master ~]# hdfs dfs -rm -R -skipTrash /user/oozie
rm: Cannot delete /user/oozie. Name node is in safe mode.

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here