Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

New Contributor

Hi folks,

 

We have our own recursive FileInputFormat that extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat. Basically, we call FileInputFormat.getSplits to get all the FileSplits of the input file path; if FileSplit returned is a directory, we expand the directory, in this way we flatten all directories(if any) in the input file path.

 

This works well in HDP 1.3, Apach Hadoop 0.20.2,  and 1.2 . However, we find that FileInputFormat.getSplits throws "FileNotFoundException: Path is not a file" on CDH 4.3.  This looks like a backwards compatibility issue. Is this behavior change expected or is it a bug?

 

Here is the stack trace ( note that /user/builder/rh6-intel64-70-test-1.marklogic.com/export-docs is the input file path, under which we have direcotry "dir1" that contains files):

 

java.io.FileNotFoundException: Path is not a file: /user/builder/rh6-intel64-70-test-1.marklogic.com/export-docs/dir1
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1341)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1293)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1269)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1242)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
stderr/out from shell cmd:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:884)
at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:921)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:188)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:181)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:253)

 

Thanks,

Aries

2 REPLIES 2

Re: FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

Master Guru
Its hard to say what is triggering this (without seeing your custom code), but calling getFileBlockLocations on an directory inode would yield that exception for certain. Perhaps something's causing your listing generator function (listStatus override typically) to now pass directories accidentally into the files list?

Re: FileNotFoundException: Path is not a file when reading a directory containing files on HDFS

New Contributor

I'm aso getting a same type of error.

Initially the program is working fine with toolrunner.run. Now I removed and trying to run the same. Give me some inputs on this.