Created 07-08-2016 03:44 PM
I get an error "Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 18, <datanode>): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1426797840-1461158403571:blk_1089439824_15699635; getBlockSize()=0; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[,DISK], DatanodeInfoWithStorage[,DISK], DatanodeInfoWithStorage[,DISK]]}"
The background to this is: We changed our Oozie DB to MySQL and on restarting the cluster, the namenode failed with connectionRefused error. It was started manually from the CLI and after restarted with Ambari and it worked fine. I have used hdfs fsck to check for corrupt files but i get a 'healthy' status report.
Any clue as to how i can get past this issue? @Kuldeep Kulkarni @Artem Ervits @Sagar Shimpi @Benjamin Leonhardi
Created 08-01-2016 07:09 AM
This issue was resolved by restarting the namenode.
Created 07-08-2016 03:47 PM
Created 07-08-2016 04:00 PM
@srai Please find below the report i got from running hdfs fsck /
...............................Status: HEALTHY Total size: 84086783290897 B (Total open files size: 35725218 B) Total dirs: 255918 Total files: 7090531 Total symlinks: 0 (Files currently being written: 464) Total blocks (validated): 7131287 (avg. block size 11791249 B) (Total open file blocks (not validated): 86) Corrupt blocks: 0 Number of data-nodes: 8 Number of racks: 1 FSCK ended at Fri Jul 08 16:43:47 SAST 2016 in 141368 milliseconds The filesystem under path '/' is HEALTHY
Created 07-08-2016 03:50 PM
Can you share the hdfs fsck command you ran? It definitely sounds like HDFS is not healthy.
Created 07-08-2016 04:37 PM
Here is another one below, Josh.
Status: HEALTHY Total size: 84184775260004 B (Total open files size: 36288883 B) Total dirs: 255954 Total files: 7102482 Total symlinks: 0 (Files currently being written: 456) Total blocks (validated): 7143238 (avg. block size 11785240 B) (Total open file blocks (not validated): 79) Minimally replicated blocks: 7143238 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 130 (0.0018199029 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9979758 Corrupt blocks: 0 Missing replicas: 257 (0.0012000647 %) Number of data-nodes: 8 Number of racks: 1 FSCK ended at Fri Jul 08 18:29:19 SAST 2016 in 239594 milliseconds The filesystem under path '/' is HEALTHY
Created 07-08-2016 05:24 PM
Thanks, @Joshua Adeleke. Like in the other question linked by Srai, if you know the specific file(s) your job is reading, you could try to use the `hdfs debug recoverLease` command on those files. Normally, a lease on an HDFS file will expire automatically if the writer abnormally goes away without closing the file. If you are sure no client is trying to write the file, you could try the recoverLease to force the NN to let this operation succeed.
Created 08-01-2016 07:09 AM
This issue was resolved by restarting the namenode.