Reply
Highlighted
Explorer
Posts: 9
Registered: ‎02-05-2016

Number of tasks on executors become negative after executor failures

Summary:

I am running Spark 1.5 on CDH5.5.1.  Under extreme load intermittently I am getting this connection failure exception and later negative executor in the Spark UI.

 

Exception:

TRACE: org.apache.hadoop.hbase.ipc.AbstractRpcClient - Call: Multi, callTime: 76ms

INFO : org.apache.spark.network.client.TransportClientFactory - Found inactive connection to xxxx/xxx.xxx.xxx.xxxx, creating a new one.

ERROR: org.apache.spark.network.shuffle.RetryingBlockFetcher - Exception while beginning fetch of 1 outstanding blocks (after 1 retries)

java.io.IOException: Failed to connect to xxxx/xxx.xxx.xxx.xxxx

                at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)

                at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)

                at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)

                at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)

                at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)

                at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)

                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

                at java.util.concurrent.FutureTask.run(FutureTask.java:262)

                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

                at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.ConnectException: Connection refused: xxxx/xxx.xxx.xxx.xxxx

                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

                at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

                at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)

                at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)

                at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)

                at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)

                at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)

                at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

                at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)

                ... 1 more

 

 

Related Defects:

https://issues.apache.org/jira/browse/SPARK-2319

https://issues.apache.org/jira/browse/SPARK-9591

 

 

 

SparkNegativeActiveProcess.jpg