Member since
06-02-2016
4
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
32665 | 05-19-2017 10:48 PM |
03-28-2018
09:16 PM
I am copying files and directories owned by quite few users from Linux file systems onto HDFS. I usually run 'rsync' as root from one Linux machine to another Linux machine to be able to maintain files/directories' ownership and permission. Now, I am dealing with HDFS and HDFS admin account is not root. How do I do it so it is like rsync but for HDFS?
... View more
Labels:
- Labels:
-
Apache Hadoop
05-19-2017
10:48 PM
Ok, I think I have found the answer to this problem. It is due to Netty version mismatch. One more piece of evidence that I found as I dug into each node's log. Here is the node's error log (which did not show up in YARN log). 17/05/17 11:54:07 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=790775716012, chunkIndex=11}, buffer=FileSegmentManagedBuffer{file=/hdfs2/yarn/local/usercache/chiachi/appcache/application_1495046487864_0001/blockmgr-7f6bd1f5-a0c7-43e5-a9a5-3bd2a9d8abbd/19/shuffle_0_54_0.data, offset=5963, length=1246}} to /10.16.0.9:44851; closing connection
io.netty.handler.codec.EncoderException: java.lang.NoSuchMethodError: io.netty.channel.DefaultFileRegion.<init>(Ljava/io/File;JJ)V
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:716)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:651)
at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:658)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:716)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)
at org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194)
at org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:135)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: io.netty.channel.DefaultFileRegion.<init>(Ljava/io/File;JJ)V
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.convertToNetty(FileSegmentManagedBuffer.java:133)
at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
at org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
... 34 more Why there was Netty version discrepancy? I upgraded both Hadoop and Spark but I left HDFS running (with old Netty version). After restarted HDFS and YARN, this error was no longer existed.
... View more
05-18-2017
12:42 PM
I am experiencing massive errors on shuffle and connection reset by peer io exception for map/reduce word counting on big dataset. It worked with small dataset. I looked around on this forum as well as other places but could not find answer to this problem. Hopefully, anyone has the solution to this problem can shed some light. I am using hadoop-2.8.0 and spark-2.1.1 on 152 vcores and 848GB memory (over 12 nodes) Ubuntu-based cluster. YARN Scheduler Metrics reported memory:1024, vCores:1 as minimum allocation and memory:102400, vCores:32 as maximum allocation. Following is how I submit the job. spark-submit --master yarn --deploy-mode cluster \
--queue long \
--num-executors 4 \
--driver-cores 6 \
--executor-cores 4 \
--driver-memory 6g \
--executor-memory 4g \
1gram_count.py
Following are output error and a brief snapshot on job error log. OUTPUT ERROR: Traceback (most recent call last):
File "1gram_count.py", line 43, in <module>
main(sc, filename)
File "1gram_count.py", line 14, in main
wordcount = words.reduceByKey(add).collect()
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 808, in collect
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 1 (collect at 1gram_count.py:14) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.FetchFailedException: Failed to send request StreamChunkId{streamId=1317168709388, chunkIndex=6} to rnode4.mycluster.com/13.15.17.96:40448: java.io.IOException: Connection reset by peer
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:361)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:336)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: java.io.IOException: Failed to send request StreamChunkId{streamId=1317168709388, chunkIndex=6} to rnode4.mycluster.com/13.15.17.96:40448: java.io.IOException: Connection reset by peer
at org.apache.spark.network.client.TransportClient$1.operationComplete(TransportClient.java:163)
at org.apache.spark.network.client.TransportClient$1.operationComplete(TransportClient.java:147)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
at org.apache.spark.network.client.TransportClient.fetchChunk(TransportClient.java:146)
at org.apache.spark.network.shuffle.OneForOneBlockFetcher$1.onSuccess(OneForOneBlockFetcher.java:103)
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:176)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:492)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:270)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:315)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:676)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1059)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:718)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)
... 24 more
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1262)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1647)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
ERROR LOG: ...
17/05/16 09:54:51 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 1.0 (TID 165, rnode4.mycluster.com, executor 5): FetchFailed(BlockManagerId(6, rnode6.mycluster.com, 43712, None), shuffleId=0, mapId=129, reduceId=22, message=
org.apache.spark.shuffle.FetchFailedException: Connection reset by peer
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:361)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:336)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:492)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:270)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:315)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:676)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1059)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:718)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)
... 24 more
)
17/05/16 09:54:51 INFO scheduler.TaskSetManager: Task 22.0 in stage 1.0 (TID 165) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
...
17/05/16 09:54:51 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 1.0 (TID 171, tnode4.mycluster.com, executor 3): FetchFailed(BlockManagerId(5, rnode4.mycluster.com, 43171, None), shuffleId=0, mapId=40, reduceId=28, message=
org.apache.spark.shuffle.FetchFailedException: Failed to send request StreamChunkId{streamId=607660513078, chunkIndex=6} to rnode4.mycluster.com/13.15.17.96:43171: java.io.IOException: Connection reset by peer
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:361)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:336)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: java.io.IOException: Failed to send request StreamChunkId{streamId=607660513078, chunkIndex=6} to rnode4.mycluster.com/13.15.17.96:43171: java.io.IOException: Connection reset by peer
at org.apache.spark.network.client.TransportClient$1.operationComplete(TransportClient.java:163)
at org.apache.spark.network.client.TransportClient$1.operationComplete(TransportClient.java:147)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyLateListener(DefaultPromise.java:621)
at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:138)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
at org.apache.spark.network.client.TransportClient.fetchChunk(TransportClient.java:146)
at org.apache.spark.network.shuffle.OneForOneBlockFetcher$1.onSuccess(OneForOneBlockFetcher.java:103)
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:176)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:492)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:270)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:315)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:676)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1059)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:718)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)
... 24 more
)
17/05/16 09:54:51 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
17/05/16 09:54:51 INFO scheduler.TaskSetManager: Task 28.0 in stage 1.0 (TID 171) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/05/16 09:54:51 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 1 (71/143, false)
...
17/05/16 09:55:22 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (reduceByKey at 1gram_count.py:14) finished in 9.098 s
17/05/16 09:55:22 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/05/16 09:55:22 INFO scheduler.DAGScheduler: running: Set()
17/05/16 09:55:22 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/05/16 09:55:22 INFO scheduler.DAGScheduler: failed: Set(ShuffleMapStage 0, ResultStage 1)
17/05/16 09:55:22 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (reduceByKey at 1gram_count.py:14) because some of its tasks had failed: 0, 2, 3, 5, 7, 8, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 28, 29, 30, 33, 35, 36, 37, 40, 42, 44, 45, 46, 47, 48, 49, 54, 56, 57, 59, 61, 63, 67, 69, 70, 72, 75, 78, 81, 84, 85, 87, 88, 89, 91, 92, 97, 98, 99, 104, 106, 107, 114, 116, 117, 120, 125, 127, 130, 133, 136, 137, 139, 140, 141, 142
...
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Resubmitting ShuffleMapStage 0 (reduceByKey at 1gram_count.py:14) and ResultStage 1 (collect at 1gram_count.py:14) due to fetch failure
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 80)
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 8 from BlockManagerMaster.
17/05/16 09:55:39 INFO scheduler.TaskSetManager: Task 0.0 in stage 1.2 (TID 417) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(8, cage.mycluster.com, 43545, None)
17/05/16 09:55:39 INFO storage.BlockManagerMaster: Removed 8 successfully in removeExecutor
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 8 (epoch 80)
17/05/16 09:55:39 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 8 (133/143, false)
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Executor lost: 7 (epoch 80)
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 7 from BlockManagerMaster.
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(7, gardel.mycluster.com, 56871, None)
17/05/16 09:55:39 INFO storage.BlockManagerMaster: Removed 7 successfully in removeExecutor
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 7 (epoch 80)
17/05/16 09:55:39 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 7 (118/143, false)
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 80)
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 6 from BlockManagerMaster.
17/05/16 09:55:39 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(6, rnode6.mycluster.com, 43712, None)
17/05/16 09:55:39 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Shuffle files lost for executor: 6 (epoch 80)
17/05/16 09:55:39 INFO scheduler.ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 6 (93/143, false)
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Resubmitting failed stages
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (PairwiseRDD[3] at reduceByKey at 1gram_count.py:14), which has no missing parents
17/05/16 09:55:39 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 9.1 KB, free 3.0 GB)
17/05/16 09:55:39 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 5.8 KB, free 3.0 GB)
17/05/16 09:55:39 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on 13.15.17.113:38097 (size: 5.8 KB, free: 3.0 GB)
17/05/16 09:55:39 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:996
17/05/16 09:55:39 INFO scheduler.DAGScheduler: Submitting 50 missing tasks from ShuffleMapStage 0 (PairwiseRDD[3] at reduceByKey at 1gram_count.py:14)
17/05/16 09:55:39 INFO cluster.YarnClusterScheduler: Adding task set 0.4 with 50 tasks
17/05/16 09:55:41 INFO storage.BlockManagerMasterEndpoint: Registering block manager gardel.mycluster.com:56871 with 2004.6 MB RAM, BlockManagerId(7, gardel.mycluster.com, 56871, None)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on gardel.mycluster.com:56871 (size: 29.0 KB, free: 2004.6 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on gardel.mycluster.com:56871 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on gardel.mycluster.com:56871 (size: 5.8 KB, free: 2004.6 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on gardel.mycluster.com:56871 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on gardel.mycluster.com:56871 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on gardel.mycluster.com:56871 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on gardel.mycluster.com:56871 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:41 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on gardel.mycluster.com:56871 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:42 INFO storage.BlockManagerMasterEndpoint: Registering block manager tnode5.mycluster.com:50922 with 2004.6 MB RAM, BlockManagerId(4, tnode5.mycluster.com, 50922, None)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on tnode5.mycluster.com:50922 (size: 29.0 KB, free: 2004.6 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on tnode5.mycluster.com:50922 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on tnode5.mycluster.com:50922 (size: 5.8 KB, free: 2004.6 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on tnode5.mycluster.com:50922 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on tnode5.mycluster.com:50922 (size: 3.7 KB, free: 2004.6 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on tnode5.mycluster.com:50922 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on tnode5.mycluster.com:50922 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:42 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on tnode5.mycluster.com:50922 (size: 5.8 KB, free: 2004.5 MB)
17/05/16 09:55:42 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.4 (TID 450, rnode1.mycluster.com, executor 1, partition 7, NODE_LOCAL, 6061 bytes)
...
17/05/16 09:56:08 ERROR yarn.ApplicationMaster: User application exited with status 1
17/05/16 09:56:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
...
17/05/16 09:56:08 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/05/16 09:56:08 INFO server.ServerConnector: Stopped Spark@295a0399{HTTP/1.1}{0.0.0.0:0}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4650135{/stages/stage/kill,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7fa661e4{/jobs/job/kill,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@12cbc3d7{/api,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ed7eff8{/,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@19854e4e{/static,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@d795813{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7e534778{/executors/threadDump,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5555f52c{/executors/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7dfe96ab{/executors,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@6966dbd{/environment/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@28097bac{/environment,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@722feb6d{/storage/rdd/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5854bbf{/storage/rdd,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@332fc652{/storage/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@574bd394{/storage,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@71506d9a{/stages/pool/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4daa72b3{/stages/pool,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7a71edaa{/stages/stage/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@e89d2da{/stages/stage,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@70e609be{/stages/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5f7ca7b3{/stages,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@77d95d4d{/jobs/job/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76b92f6f{/jobs/job,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7330862d{/jobs/json,null,UNAVAILABLE,@Spark}
17/05/16 09:56:08 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@76828613{/jobs,null,UNAVAILABLE,@Spark}
...
17/05/16 09:56:08 INFO ui.SparkUI: Stopped Spark web UI at http://13.15.17.113:35816
...
17/05/16 09:56:08 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(1,3,ResultTask,FetchFailed(BlockManagerId(2, tnode3.mycluster.com, 44720, None),0,6,2,org.apache.spark.shuffle.FetchFailedException: Connection from tnode3.mycluster.com/13.15.17.107:44720 closed
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:361)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:336)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:54)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
Caused by: java.io.IOException: Connection from tnode3.mycluster.com/13.15.17.107:44720 closed
at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:128)
at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:108)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:247)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:769)
at io.netty.channel.AbstractChannel$AbstractUnsafe$5.run(AbstractChannel.java:567)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
),org.apache.spark.scheduler.TaskInfo@4a03abd0,null)
17/05/16 09:56:08 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/05/16 09:56:08 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/05/16 09:56:08 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
17/05/16 09:56:08 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
17/05/16 09:56:08 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
...
17/05/16 09:56:08 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/05/16 09:56:08 INFO memory.MemoryStore: MemoryStore cleared
17/05/16 09:56:08 INFO storage.BlockManager: BlockManager stopped
17/05/16 09:56:08 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/05/16 09:56:08 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/05/16 09:56:08 INFO spark.SparkContext: Successfully stopped SparkContext
17/05/16 09:56:09 INFO util.ShutdownHookManager: Shutdown hook called
17/05/16 09:56:09 INFO util.ShutdownHookManager: Deleting directory /hdfs/yarn/local/usercache/jacky/appcache/application_1494630332824_0048/spark-11b09437-1b50-4953-a071-56ee1530b00f
17/05/16 09:56:09 INFO util.ShutdownHookManager: Deleting directory /hdfs/yarn/local/usercache/jacky/appcache/application_1494630332824_0048/spark-11b09437-1b50-4953-a071-56ee1530b00f/pyspark-5a8a2155-4e7a-4099-aaf6-d96bed36c671
17/05/16 09:56:09 INFO util.ShutdownHookManager: Deleting directory /hdfs2/yarn/local/usercache/jacky/appcache/application_1494630332824_0048/spark-a16dd72a-70b4-4da4-96c3-7c655f9eb23
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN