<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60864#M69359</link>
    <description>&lt;P&gt;We don't see any storage space issues. The we two local.dirs setup on two mount volumes. and both of them have more than 400 GB free space. I don't see any Java GC issues as well.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using the following link to monitor the GC charts:&lt;/P&gt;&lt;P&gt;&lt;A href="http://10.0.0.247:7180/cmf/services/18/charts#q=Concurrent" target="_blank"&gt;http://10.0.0.247:7180/cmf/services/18/charts#q=Concurrent&lt;/A&gt; Mark Sweep Garbage Collection Time Across NodeManagers&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know if we have to monitor anything else in addition to this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Vishal&lt;/P&gt;</description>
    <pubDate>Thu, 12 Oct 2017 14:59:39 GMT</pubDate>
    <dc:creator>rabk</dc:creator>
    <dc:date>2017-10-12T14:59:39Z</dc:date>
    <item>
      <title>Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60809#M69354</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a CDH 5.12 kerborized cluster with 2 datanodes running a spark-shell from the edge node with&amp;nbsp;master = yarn-client.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using the sqlContext setup, we create a DataFrame using a simple SQL query.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;scala&amp;gt; val df = sqlContext.sql("SELECT T1.PRVDR_NUM, SUM(T1.NCH_PRMRY_PYR_CLM_PD_AMT) FROM METASENSE_DATALAKE_vishal.CMS_IP_CLAIMS AS T1 group by T1.PRVDR_NUM")&lt;BR /&gt;df: org.apache.spark.sql.DataFrame = [PRVDR_NUM: string, _c1: double]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Following the dataframe creation with a simple df.show() action kicks of a spark job that results in&amp;nbsp;org.apache.spark.shuffle.FetchFailedException and fails.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the exception:&lt;/P&gt;&lt;P&gt;[Stage 0:============================================&amp;gt; (3 + 1) / 4]17/10/10 16:51:39 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, ip-10-0-0-72.ec2.internal, executor 3): FetchFailed(BlockManagerId(2, ip-10-0-0-72.ec2.internal, 7337), shuffleId=0, mapId=2, reduceId=0, message=&lt;BR /&gt;org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:383)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:361)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:55)&lt;BR /&gt;at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)&lt;BR /&gt;at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.&amp;lt;init&amp;gt;(TungstenAggregationIterator.scala:686)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)&lt;BR /&gt;at org.apache.spark.scheduler.Task.run(Task.scala:89)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:186)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:106)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;... 1 more&lt;/P&gt;&lt;P&gt;)&lt;BR /&gt;17/10/10 16:51:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.1 (TID 6, ip-10-0-0-72.ec2.internal, executor 2): FetchFailed(BlockManagerId(1, ip-10-0-0-72.ec2.internal, 7337), shuffleId=0, mapId=0, reduceId=0, message=&lt;BR /&gt;org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:383)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:361)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:55)&lt;BR /&gt;at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)&lt;BR /&gt;at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.&amp;lt;init&amp;gt;(TungstenAggregationIterator.scala:686)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)&lt;BR /&gt;at org.apache.spark.scheduler.Task.run(Task.scala:89)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:186)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:106)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;... 1 more&lt;/P&gt;&lt;P&gt;)&lt;BR /&gt;17/10/10 16:51:41 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.2 (TID 8, ip-10-0-0-72.ec2.internal, executor 3): FetchFailed(BlockManagerId(2, ip-10-0-0-72.ec2.internal, 7337), shuffleId=0, mapId=2, reduceId=0, message=&lt;BR /&gt;org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:383)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:361)&lt;BR /&gt;at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:55)&lt;BR /&gt;at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)&lt;BR /&gt;at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)&lt;BR /&gt;at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.&amp;lt;init&amp;gt;(TungstenAggregationIterator.scala:686)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)&lt;BR /&gt;at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)&lt;BR /&gt;at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)&lt;BR /&gt;at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)&lt;BR /&gt;at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)&lt;BR /&gt;at org.apache.spark.scheduler.Task.run(Task.scala:89)&lt;BR /&gt;at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:186)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:106)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;... 1 more&lt;/P&gt;&lt;P&gt;)&lt;BR /&gt;17/10/10 16:51:41 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.3 (TID 10, ip-10-0-0-72.ec2.internal, executor 4): FetchFailed(BlockManagerId(1, ip-10-0-0-72.ec2.internal, 7337), shuffleId=0, mapId=0, reduceId=0, message=&lt;BR /&gt;org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:243)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getBlockData(ExternalShuffleBlockResolver.java:147)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:85)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:72)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:154)&lt;BR /&gt;at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)&lt;BR /&gt;at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)&lt;BR /&gt;at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)&lt;BR /&gt;at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)&lt;BR /&gt;at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)&lt;BR /&gt;at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)&lt;BR /&gt;at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)&lt;BR /&gt;at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:745)&lt;BR /&gt;Caused by: java.io.FileNotFoundException: /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-2742a1a6-7cbd-46e4-8154-293ab27fcc22/38/shuffle_0_0_0.index (No such file or directory)&lt;BR /&gt;at java.io.FileInputStream.open(Native Method)&lt;BR /&gt;at java.io.FileInputStream.&amp;lt;init&amp;gt;(FileInputStream.java:146)&lt;BR /&gt;at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:232)&lt;BR /&gt;... 27 more&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the Spark configuration:&lt;/P&gt;&lt;P&gt;spark.dynamicAllocation.enabled true&lt;BR /&gt;spark.dynamicAllocation.executorIdleTimeout 60&lt;BR /&gt;spark.dynamicAllocation.minExecutors 0&lt;BR /&gt;spark.dynamicAllocation.schedulerBacklogTimeout 1&lt;BR /&gt;spark.eventLog.dir hdfs://ip-10-0-0-247.ec2.internal:8020/user/spark/applicationHistory&lt;BR /&gt;spark.eventLog.enabled true&lt;BR /&gt;spark.executor.extraLibraryPath /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/lib/native&lt;BR /&gt;spark.executor.id driver&lt;BR /&gt;spark.externalBlockStore.folderName spark-08c7a6d2-eb56-40eb-b198-735d73c665d7&lt;BR /&gt;spark.extraListeners com.cloudera.spark.lineage.ClouderaNavigatorListener&lt;BR /&gt;spark.jars file:/home/vishal/.ivy2/jars/com.databricks_spark-csv_2.10-1.5.0.jar,file:/home/vishal/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/home/vishal/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar&lt;BR /&gt;spark.lineage.enabled false&lt;BR /&gt;spark.lineage.log.dir /var/log/spark/lineage&lt;BR /&gt;spark.master yarn-client&lt;BR /&gt;spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS ip-10-0-0-247.ec2.internal&lt;BR /&gt;spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES &lt;A href="http://ip-10-0-0-247.ec2.internal:8088/proxy/application_1507613154611_0799" target="_blank"&gt;http://ip-10-0-0-247.ec2.internal:8088/proxy/application_1507613154611_0799&lt;/A&gt;&lt;BR /&gt;spark.repl.class.outputDir /tmp/spark-e0de2a11-5402-48ec-8379-e190febe6ca0/repl-6fb66657-7680-4fd6-82db-3d7aa1daf641&lt;BR /&gt;spark.repl.class.uri spark://10.0.0.36:34996/classes&lt;BR /&gt;spark.scheduler.mode FIFO&lt;BR /&gt;spark.serializer org.apache.spark.serializer.KryoSerializer&lt;BR /&gt;spark.shuffle.encryption.enabled false&lt;BR /&gt;spark.shuffle.service.enabled true&lt;BR /&gt;spark.shuffle.service.port 7337&lt;BR /&gt;spark.sql.queryExecutionListeners com.cloudera.spark.lineage.ClouderaNavigatorListener&lt;BR /&gt;spark.submit.deployMode client&lt;BR /&gt;spark.ui.enabled true&lt;BR /&gt;spark.ui.filters org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter&lt;BR /&gt;spark.ui.killEnabled true&lt;BR /&gt;spark.yarn.am.extraLibraryPath /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hadoop/lib/native&lt;BR /&gt;spark.yarn.config.gatewayPath /opt/cloudera/parcels&lt;BR /&gt;spark.yarn.config.replacementPath {{HADOOP_COMMON_HOME}}/../../..&lt;BR /&gt;spark.yarn.historyServer.address &lt;A href="http://ip-10-0-0-247.ec2.internal:18088" target="_blank"&gt;http://ip-10-0-0-247.ec2.internal:18088&lt;/A&gt;&lt;BR /&gt;spark.yarn.historyServer.allowTracking true&lt;BR /&gt;spark.yarn.jar local:/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/lib/spark-assembly.jar&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yarn configuration:&lt;BR /&gt;yarn.nodemanager.resource.memory-mb=30720&lt;BR /&gt;yarn.scheduler.maximum-allocation-mb=4096&lt;BR /&gt;yarn.scheduler.increment-allocation-mb=1024&lt;BR /&gt;yarn.scheduler.minimum-allocation-mb=512&lt;BR /&gt;yarn.scheduler.maximum-allocation-vcores=8&lt;BR /&gt;yarn.scheduler.minimum-allocation-vcores=1&lt;BR /&gt;yarn.scheduler.increment-allocation-vcores=1&lt;BR /&gt;yarn.nodemanager.local-dirs=/data01/yarn/nm,/data02/yarn/nm&lt;BR /&gt;yarn.resourcemanager.nodemanagers.heartbeat-interval-ms=1000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The dataset we are using to perform the task is not very huge (~18MB). We don't see any OOM errors. The executors are terminated due to SIGTERM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We notice that the shuffle file is actually created on data node 1 but the executor was looking for it on data node 2 and failing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[root@ip-10-0-0-82 ~]# ll /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;BR /&gt;-rw-r----- 1 vishal yarn 1608 Oct 10 16:51 /data01/yarn/nm/usercache/vishal/appcache/application_1507613154611_0799/blockmgr-d4f003e6-84ac-4357-8e10-a58b3d040c65/39/shuffle_0_2_0.index&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are not sure what could be causing this issue. Please help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:22:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60809#M69354</guid>
      <dc:creator>rabk</dc:creator>
      <dc:date>2022-09-16T12:22:59Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60840#M69355</link>
      <description>&lt;P&gt;Can you verify that shuffle auxilary service is enabled within YARN?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 20:18:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60840#M69355</guid>
      <dc:creator>hubbarja</dc:creator>
      <dc:date>2017-10-11T20:18:16Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60841#M69356</link>
      <description>&lt;P&gt;Also, please make sure you do not see any issues with host devices like Out of Space on disk or Java GC issues.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 20:58:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60841#M69356</guid>
      <dc:creator>jahubbar</dc:creator>
      <dc:date>2017-10-11T20:58:39Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60846#M69357</link>
      <description>&lt;P&gt;Hi Rabk,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What the log4j WARN message provided shows is a task thats failing with a FetchFailedException because a shuffle file (&lt;SPAN&gt;shuffle_0_2_0.index) can't be found, it does not show what the job fails with or what transpires during the job run. But lets assume that the job fails when a stage has failed 4 times because of the fetch failures.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;One possible cause for the&amp;nbsp;FetchFailedException is that you are running out of space on Nodemanager local-dirs (where shuffle files are stored), so look at the Nodemanager logs (on datanode 2), from the timeframe when the job ran, for bad disk/local-dirs messages. When this happens, YARN will send a SIGTERM to the containers/executors, like you have observed. Are you able to run the query on just a fraction of the current data or does it succeed if no other jobs are running on the cluster?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 22:49:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60846#M69357</guid>
      <dc:creator>bjorn.jonsson</dc:creator>
      <dc:date>2017-10-11T22:49:38Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60862#M69358</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;Yes, the shuffle auxillary&amp;nbsp;service is enabled. Also, the external shuffle service for spark is enabled as well.&amp;nbsp;&lt;BR /&gt;yarn.nodemanager.aux-services.spark_shuffle.class&amp;nbsp;to&amp;nbsp;&lt;BR /&gt;org.apache.spark.network.yarn.YarnShuffleService&lt;BR /&gt;Thanks,&lt;BR /&gt;Vishal&lt;/P&gt;</description>
      <pubDate>Thu, 12 Oct 2017 14:54:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60862#M69358</guid>
      <dc:creator>rabk</dc:creator>
      <dc:date>2017-10-12T14:54:33Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60864#M69359</link>
      <description>&lt;P&gt;We don't see any storage space issues. The we two local.dirs setup on two mount volumes. and both of them have more than 400 GB free space. I don't see any Java GC issues as well.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using the following link to monitor the GC charts:&lt;/P&gt;&lt;P&gt;&lt;A href="http://10.0.0.247:7180/cmf/services/18/charts#q=Concurrent" target="_blank"&gt;http://10.0.0.247:7180/cmf/services/18/charts#q=Concurrent&lt;/A&gt; Mark Sweep Garbage Collection Time Across NodeManagers&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know if we have to monitor anything else in addition to this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Vishal&lt;/P&gt;</description>
      <pubDate>Thu, 12 Oct 2017 14:59:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60864#M69359</guid>
      <dc:creator>rabk</dc:creator>
      <dc:date>2017-10-12T14:59:39Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60865#M69360</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As I mentioned in one of my earlier replies, there is more than 400 GB space available on configured local.dirs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We do see the queries run successfully sometimes. But most times, we are seeing the file not found exceptions. The tasks are being retried before being given up.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have yarn logs from two of my previous test runs. I will send them to you for futher analysis.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, I will try running the shell again and capture more logs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Vishal&lt;/P&gt;</description>
      <pubDate>Thu, 12 Oct 2017 15:37:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/60865#M69360</guid>
      <dc:creator>rabk</dc:creator>
      <dc:date>2017-10-12T15:37:32Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job running from a spark-shell fails with org.apache.spark.shuffle.FetchFailedException</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/61045#M69361</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for following up. We have identified the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;cause&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and resolved it. Our DNS was setup incorrectly. This issue can be closed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Vishal&lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2017 13:52:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-running-from-a-spark-shell-fails-with-org-apache/m-p/61045#M69361</guid>
      <dc:creator>rabk</dc:creator>
      <dc:date>2017-10-18T13:52:21Z</dc:date>
    </item>
  </channel>
</rss>

