Support Questions
Find answers, ask questions, and share your expertise

Spark DataFrame to ORC write causing HDFS LeaseExpiredException followed by Java.nio.channels.ClosedChannelException

I am working on a transformation where I am reading list of parquet files one by one and applying some transformations converting into DataFrame and saving DataFrame as ORC files on HDFS using HDP 2.3.x,Spark 1.5.2/Scala. What I am understanding is whenever a spark tries to save a large chunk of split as ORC it is ending up with LeaseExpiredException followed by Java.nio.channels.ClosedChannelException. Please find below stack trace for more details. If anybody have faced the same issue can you please share your thoughts on this.

java.lang.RuntimeException: Failed to commit task at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:265) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:245) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /*********/data/orc/*********/_temporary/0/_temporary/attempt_201607141817_0055_m_000009_0/part-r-00009-ef66e830-5dae-4454-b228-252f3e4f2233.orc (inode 1104399): File does not exist. Holder DFSClient_NONMAPREDUCE_222813080_61 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3430) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3520) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3487) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:787) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:536) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)

at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy18.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:443) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy19.complete(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2250) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2234) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2429) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91) at org.apache.spark.sql.hive.orc.OrcOutputWriter.close(OrcRelation.scala:144) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:257) ... 9 more 16/07/14 18:20:37 WARN output.FileOutputCommitter: Could not delete hdfs://SomeHost:8020/********/data/orc/table/_temporary/0/_temporary/attempt_201607141817_0055_m_000009_0 16/07/14 18:20:37 ERROR datasources.DefaultWriterContainer: Task attempt attempt_201607141817_0055_m_000009_0 aborted. 16/07/14 18:20:37 ERROR executor.Executor: Exception in task 9.0 in stage 55.0 (TID 188) java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1622) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:104) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hive.ql.io.orc.WriterImpl$DirectStream.output(WriterImpl.java:464) at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:242) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2328) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:91) at org.apache.spark.sql.hive.orc.OrcOutputWriter.close(OrcRelation.scala:144) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.abortTask$1(WriterContainer.scala:272) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:249) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

1 REPLY 1

can you try after turning speculative execution to false.

spark.speculation=false