<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question CDH 5.5.0 Spark 1.5.0 Scalability issue: coalesce and persist(StorageLevel.DISK_ONLY) fails in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-5-0-Spark-1-5-0-Scalability-issue-coalesce-and-persist/m-p/40515#M27385</link>
    <description>&lt;P&gt;We have a&amp;nbsp;spark job reads a large number of gzipped text files in (40K) or so under production level loads and fails (but works fine for smaller inputs). We were advised&amp;nbsp;to reduce our partition count, hence we implemented a persistent RDD that read in the inputs, coalesced them to an acceptable number of partitions (say around 4K) and then persisted them using StorageLevel.DISK_ONLY, the local file systems and memory seem very well provisioned for our workload yet this solution failed in spite of lengthy efforts, &amp;nbsp;the job would consistently fail. &amp;nbsp;In desperation we consume precious HDFS space (which has severe contention in our environment) to write back the coalesced partitions and load the partitions in a fresh RDD. &amp;nbsp;We would prefer to persist the coalesced data a different way or have spark handle a larger number of partitions.&amp;nbsp;Why can we save this data on HDFS but persist fails, even with DISK_ONLY?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The (somewhat sanitized) stack trace of the failing coalesce &amp;nbsp;and persist error follows, is it actually possible to use the local disk storage instead of hdfs for this. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;akka.event.Logging$Error$NoCause$: null&lt;BR /&gt;2016:05:05:15:32:55.829 [task-result-getter-1] ERROR o.a.spark.scheduler.TaskSetManager.logError:75 - Task 595 in stage 1.0 failed 4 times; aborting job&lt;BR /&gt;2016:05:05:15:32:55.854 [Driver] INFO c.a.g.myproject.common.MyProjectLogger.info:23 - Job completed&lt;BR /&gt;2016:05:05:15:32:55.856 [Driver] INFO c.a.g.myproject.common.MyProjectLogger.info:23 - postAction : begin&lt;BR /&gt;2016:05:05:15:32:57.070 [Driver] ERROR c.a.g.myproject.common.MyProjectLogger.error:31 - action [GalacticaPostProcessing] processing failed. stack trace :&lt;BR /&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281)&lt;BR /&gt;at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)&lt;BR /&gt;at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)&lt;BR /&gt;at scala.Option.foreach(Option.scala:236)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)&lt;BR /&gt;at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:950)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:909)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1444)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1432)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.runAction(MyClass.scala:236)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.doRun(BaseSparkAction.scala:120)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply$mcZ$sp(BaseSparkAction.scala:153)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153)&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply$mcV$sp(BaseSparkAction.scala:152)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147)&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.run(BaseSparkAction.scala:147)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$.main(BaseSparkAction.scala:215)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass$.main(MyClass.scala:387)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.main(MyClass.scala)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497)&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;&lt;BR /&gt;2016:05:05:15:32:57.073 [Driver] ERROR o.a.s.deploy.yarn.ApplicationMaster.logError:96 - User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281) ~[main.jar:na]&lt;BR /&gt;at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[main.jar:na]&lt;BR /&gt;at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at scala.Option.foreach(Option.scala:236) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:950) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:909) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1444) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.runAction(MyClass.scala:236) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.doRun(BaseSparkAction.scala:120) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply$mcZ$sp(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply$mcV$sp(BaseSparkAction.scala:152) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.run(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$.main(BaseSparkAction.scala:215) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass$.main(MyClass.scala:387) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.main(MyClass.scala) ~[main.jar:na]&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_45]&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_45]&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525) ~[main.jar:na]&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) ~[main.jar:na]&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[main.jar:na]&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:17:29 GMT</pubDate>
    <dc:creator>BillM.</dc:creator>
    <dc:date>2022-09-16T10:17:29Z</dc:date>
    <item>
      <title>CDH 5.5.0 Spark 1.5.0 Scalability issue: coalesce and persist(StorageLevel.DISK_ONLY) fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-5-0-Spark-1-5-0-Scalability-issue-coalesce-and-persist/m-p/40515#M27385</link>
      <description>&lt;P&gt;We have a&amp;nbsp;spark job reads a large number of gzipped text files in (40K) or so under production level loads and fails (but works fine for smaller inputs). We were advised&amp;nbsp;to reduce our partition count, hence we implemented a persistent RDD that read in the inputs, coalesced them to an acceptable number of partitions (say around 4K) and then persisted them using StorageLevel.DISK_ONLY, the local file systems and memory seem very well provisioned for our workload yet this solution failed in spite of lengthy efforts, &amp;nbsp;the job would consistently fail. &amp;nbsp;In desperation we consume precious HDFS space (which has severe contention in our environment) to write back the coalesced partitions and load the partitions in a fresh RDD. &amp;nbsp;We would prefer to persist the coalesced data a different way or have spark handle a larger number of partitions.&amp;nbsp;Why can we save this data on HDFS but persist fails, even with DISK_ONLY?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The (somewhat sanitized) stack trace of the failing coalesce &amp;nbsp;and persist error follows, is it actually possible to use the local disk storage instead of hdfs for this. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;akka.event.Logging$Error$NoCause$: null&lt;BR /&gt;2016:05:05:15:32:55.829 [task-result-getter-1] ERROR o.a.spark.scheduler.TaskSetManager.logError:75 - Task 595 in stage 1.0 failed 4 times; aborting job&lt;BR /&gt;2016:05:05:15:32:55.854 [Driver] INFO c.a.g.myproject.common.MyProjectLogger.info:23 - Job completed&lt;BR /&gt;2016:05:05:15:32:55.856 [Driver] INFO c.a.g.myproject.common.MyProjectLogger.info:23 - postAction : begin&lt;BR /&gt;2016:05:05:15:32:57.070 [Driver] ERROR c.a.g.myproject.common.MyProjectLogger.error:31 - action [GalacticaPostProcessing] processing failed. stack trace :&lt;BR /&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281)&lt;BR /&gt;at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)&lt;BR /&gt;at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)&lt;BR /&gt;at scala.Option.foreach(Option.scala:236)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469)&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)&lt;BR /&gt;at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:950)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:909)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:907)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1444)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1432)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.runAction(MyClass.scala:236)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.doRun(BaseSparkAction.scala:120)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply$mcZ$sp(BaseSparkAction.scala:153)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153)&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply$mcV$sp(BaseSparkAction.scala:152)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147)&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.run(BaseSparkAction.scala:147)&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$.main(BaseSparkAction.scala:215)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass$.main(MyClass.scala:387)&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.main(MyClass.scala)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497)&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;&lt;BR /&gt;2016:05:05:15:32:57.073 [Driver] ERROR o.a.s.deploy.yarn.ApplicationMaster.logError:96 - User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 595 in stage 1.0 failed 4 times, most recent failure: Lost task 595.3 in stage 1.0 (TID 5009, a-cluster-node.mycompany.com): java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;Driver stacktrace:&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281) ~[main.jar:na]&lt;BR /&gt;at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[main.jar:na]&lt;BR /&gt;at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at scala.Option.foreach(Option.scala:236) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1124) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply$mcV$sp(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$3.apply(PairRDDFunctions.scala:951) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:950) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:909) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$2.apply(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:907) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply$mcV$sp(RDD.scala:1444) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$2.apply(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1432) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.runAction(MyClass.scala:236) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.doRun(BaseSparkAction.scala:120) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply$mcZ$sp(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1$$anonfun$2.apply(BaseSparkAction.scala:153) ~[main.jar:na]&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply$mcV$sp(BaseSparkAction.scala:152) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$$anonfun$1.apply(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at scala.util.Try$.apply(Try.scala:161) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction.run(BaseSparkAction.scala:147) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.batch.action.BaseSparkAction$.main(BaseSparkAction.scala:215) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass$.main(MyClass.scala:387) ~[main.jar:na]&lt;BR /&gt;at com.mycompany.myproject.mysubproject.postprocessing.MyClass.main(MyClass.scala) ~[main.jar:na]&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_45]&lt;BR /&gt;at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_45]&lt;BR /&gt;at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]&lt;BR /&gt;at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]&lt;BR /&gt;at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525) ~[main.jar:na]&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE&lt;BR /&gt;at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)&lt;BR /&gt;at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)&lt;BR /&gt;at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1206)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)&lt;BR /&gt;at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:522)&lt;BR /&gt;at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:312)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)&lt;BR /&gt;at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)&lt;BR /&gt;at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)&lt;BR /&gt;at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:115)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:162) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:103) ~[main.jar:na]&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) ~[main.jar:na]&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) ~[main.jar:na]&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[main.jar:na]&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:17:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-5-0-Spark-1-5-0-Scalability-issue-coalesce-and-persist/m-p/40515#M27385</guid>
      <dc:creator>BillM.</dc:creator>
      <dc:date>2022-09-16T10:17:29Z</dc:date>
    </item>
    <item>
      <title>Re: CDH 5.5.0 Spark 1.5.0 Scalability issue: coalesce and persist(StorageLevel.DISK_ONLY) fails</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-5-0-Spark-1-5-0-Scalability-issue-coalesce-and-persist/m-p/41208#M27386</link>
      <description>One way to workaround the 2GB limitation is to increase the number of partitions.</description>
      <pubDate>Tue, 24 May 2016 01:15:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/CDH-5-5-0-Spark-1-5-0-Scalability-issue-coalesce-and-persist/m-p/41208#M27386</guid>
      <dc:creator>Dat Pham</dc:creator>
      <dc:date>2016-05-24T01:15:27Z</dc:date>
    </item>
  </channel>
</rss>

