Reply
New Contributor
Posts: 1
Registered: ‎11-05-2013

FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

Hi folks,

 

We have our own recursive FileInputFormat that extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat. Basically, we call FileInputFormat.getSplits to get all the FileSplits of the input file path; if FileSplit returned is a directory, we expand the directory, in this way we flatten all directories(if any) in the input file path.

 

This works well in HDP 1.3, Apach Hadoop 0.20.2,  and 1.2 . However, we find that FileInputFormat.getSplits throws "FileNotFoundException: Path is not a file" on CDH 4.3.  This looks like a backwards compatibility issue. Is this behavior change expected or is it a bug?

 

Here is the stack trace ( note that /user/builder/rh6-intel64-70-test-1.marklogic.com/export-docs is the input file path, under which we have direcotry "dir1" that contains files):

 

java.io.FileNotFoundException: Path is not a file: /user/builder/rh6-intel64-70-test-1.marklogic.com/export-docs/dir1
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1341)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1293)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1269)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1242)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
stderr/out from shell cmd:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:884)
at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:921)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:188)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:181)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:253)

 

Thanks,

Aries

Highlighted
Posts: 1,903
Kudos: 436
Solutions: 307
Registered: ‎07-31-2013

Re: FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

Its hard to say what is triggering this (without seeing your custom code), but calling getFileBlockLocations on an directory inode would yield that exception for certain. Perhaps something's causing your listing generator function (listStatus override typically) to now pass directories accidentally into the files list?
New Contributor
Posts: 2
Registered: ‎07-05-2014

Re: FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

I'm aso getting a same type of error.

Initially the program is working fine with toolrunner.run. Now I removed and trying to run the same. Give me some inputs on this.