Support Questions

pvganesh · ‎05-27-2015

Exceptions 'Failed to create file','Failed close file. Lease recovery in progress.try again later'

   We are facing an issue while working with Cloudera CDH – 5.1.2 ( Default MR2 YARN). Request your expertise in this topic.

In this case, We are extracting data from a some source via Talend ETL tool and storing the files(three file here) in HDFS file system. It works fine in most of the cases, But fails randomly with below exception(s)

Please note that the issue is intermittent and if you kill the job and restart the process it works fine.

Exception in component tHDFSOutput_3
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/mya4kor/test/messages/Sensor_Data.txt] for [DFSClient_NONMAPREDUCE_929417152_1] on client [10.0.2.5], because this file is already being created by [DFSClient_NONMAPREDUCE_82386981_1] on [10.0.2.5]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
       .....
       .....
        at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
        at testproject.manufacturing_0_1.manufacturing.tInfiniteLoop_1Process(manufacturing.java:18254)
        at testproject.manufacturing_0_1.manufacturing.runJobInTOS(manufacturing.java:20359)
        at testproject.manufacturing_0_1.manufacturing.main(manufacturing.java:20003)
[statistics] disconnected
Job manufacturing ended at 11:58 26/05/2015. [exit code=1]

Error: 2
Exception in component tHDFSOutput_7
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.RecoveryInProgressException): Failed to close file /user/mya4kor/test/messages/Sensor_Data.txt. Lease recovery is in progress. Try again later.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2538)
       ...
       ...
        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
        at testproject.manufacturing_0_1.manufacturing.tInfiniteLoop_1Process(manufacturing.java:12892)
        at testproject.manufacturing_0_1.manufacturing.runJobInTOS(manufacturing.java:19588)
        at testproject.manufacturing_0_1.manufacturing.main(manufacturing.java:19232)
Job manufacturing ended at 17:29 25/05/2015. [exit code=1]

Thanks.

Regards,
Ganesh

Harsh J · ‎06-21-2015

You (or your used software) appear to be using appends on files that are being modified in parallel by other concurrent jobs/workflows/etc.. HDFS uses a single-writer model for its files, so observing this error is normal if your software does not have logic handling it and waiting for a proper writer lease to perform its work.

Without audit logs of the filenames involved, there's little more we can tell. We also advise against using appends unless you absolutely require it for your use-cases.