Support Questions
Find answers, ask questions, and share your expertise

File not found exception on _temporary directory

Hi,

I have a Hadoop cluster with 3 data nodes with the following versions.

HDP - 2.5.3

Spark - 1.6.2

sbt version - 0.13.13

I am trying to write a data frame (a very basic data frame of 5 rows and 2 columns) into a file in hdfs using spark-submit. I get the below error (in Italics). I could only see _temporary folder at the destination path, containing partitions. It is not able to delete the contents of temporary folder and write the output permanently. However, when I try to write a RDD to text file in hdfs, it successfully writes the output.

Error-

attempt_xyz: not committed because the driver did not authorize commit. Task was denied committing.

Task attemt_xyz aborted.

Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.151 in stage 0.0 (TID 300, localhost): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:269) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$anonfun$run$1$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$anonfun$run$1$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Failed to commit task at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$commitTask$1(WriterContainer.scala:283) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:265) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$anonfun$writeRows$1.apply(WriterContainer.scala:260) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$anonfun$writeRows$1.apply(WriterContainer.scala:260) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266) ... 8 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on ABC/day=2017-02-16/hour=15/_temporary/0/_temporary/attempt_201702161507_0000_m_000001_151/part-r-00001-3b30ab0b-dee9-429f-8a0a-f8e5704a6cc8 (inode 199900): File does not exist. Holder DFSClient_attempt_201702161507_0000_m_000001_151_2120850886_197 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3521) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3611) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3578) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:905) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:544) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552) at org.apache.hadoop.ipc.Client.call(Client.java:1496) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy13.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:501) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176) at com.sun.proxy.$Proxy14.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2361) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2338) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2303) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:111) at org.apache.spark.sql.execution.datasources.text.TextOutputWriter.close(DefaultSource.scala:168) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$commitTask$1(WriterContainer.scala:275) ... 13 more

Kindly suggest the root cause of the issue with possible solution.

1 REPLY 1

Re: File not found exception on _temporary directory

That message is odd. At a guess (And this is a guess, as HDFS isn't something I know the internals of), HDFS is rejecting the attempt to close the file as the namenode doesn't think the file is open. Now, does this happen every time? I could imagine this being a transient even as a namenode rebooted or something, but I'd be very surprised to see it repeatedly