Created 04-04-2017 01:38 AM
Hello.
MapReduce job couldn't start because a file cannot be readable.
When I tried to access the file, the following error happened.
org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException): java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.ipc.Client.call(Client.java:1466) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy11.getListing(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:559) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy12.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2080)
My action.
(1) sudo -u hdfs hdfs fsck /
fsck is stopped just in front of the error file. the result is "Failed"
/services/chikayo-dsp-bidder/click/hive/day=20170403/13.fluentd01.sv.infra.log 244412 bytes, 1 block(s): OK /services/chikayo-dsp-bidder/click/hive/day=20170403/13.fluentd02.sv.infra.log 282901 bytes, 1 block(s): OK /services/chikayo-dsp-bidder/click/hive/day=20170403/13.fluentd03.sv.infra.log 280334 bytes, 1 block(s): OK /services/chikayo-dsp-bidder/click/hive/day=20170403/14.fluentd01.sv.infra.log 258240 bytes, 1 block(s): OK FSCK ended at Mon Apr 03 18:16:08 JST 2017 in 3074 milliseconds null Fsck on path '/services/chikayo-dsp-bidder' FAILED
(2) sudo -u hdfs hdfs dfsadmin -report
Configured Capacity: 92383798755328 (84.02 TB) Present Capacity: 89209585066072 (81.14 TB) DFS Remaining: 19736633480052 (17.95 TB) DFS Used: 69472951586020 (63.19 TB) DFS Used%: 77.88% Under replicated blocks: 0 Blocks with corrupt replicas: 2 Missing blocks: 0 Missing blocks (with replication factor 1): 0
Now, the error file is restored automatically. "Blocks with corrupt repilicas" is 0.
Question.
(1)Can I restore same error file manually ?
(2)What is the trigger by which restore is started ?
Thank you.
Created 08-24-2017 11:32 AM
I ran into this issue myself. I was able to resolve it like this:
hadoop fs -setrep 2 /hdfs/path/to/file
hadoop fs -setrep 3 /hdfs/path/to/file
After changing the replication factor, I was able to access the file again.
Created 07-06-2018 10:39 AM
Just like to follow up. It was later determined to be caused by HDFS-11445.
The bug was fixed in CDH 5.12.2, CDH 5.13.1 or above.