Support Questions

Find answers, ask questions, and share your expertise

Even after configuring the initial blockManager.driver.port and blockManager.port, and setting maxRetries to 300, the DataNode still attempts to use random ports. Why does this happen?

avatar
I configured as  blockManager.driver.port 21750  and blockManager.port as 21700 , maxRetries  as 300

25/02/05 17:16:08 WARN TransportChannelHandler: Exception in connection from /172.21.2X0.XXX:59698
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:254)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
4 REPLIES 4

avatar
Community Manager

@thoufeeq1218, Welcome to our community! To help you get the best possible answer, I have tagged in our Spark experts @Babasaheb @ggangadharan  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Collaborator

Hello @thoufeeq1218,

We understand that you have configured spark.blockManager.driver.port and spark.blockManager.port, but Spark may still attempt to use random ports due to the following reasons:

- Why is Spark using random ports?

Spark uses additional ports beyond BlockManager for communication between the driver and executors.

The random port seen (59698) is from the ephemeral port range (1024–65535) and could be assigned due to:

spark.driver.port (default: random)
spark.executor.port (default: random)

- How to restrict Spark to specific ports?

Explicitly set spark.driver.port to ensure the driver listens on a fixed port:

--conf spark.driver.port=21800

Ensure spark.blockManager.port is available

If 21700 is occupied, Spark will fall back to a random port.

- Understanding spark.port.maxRetries

If spark.port.maxRetries > 0 (default: 16), Spark will try additional ports within the ephemeral range.

If spark.port.maxRetries = 0, Spark will fail immediately if the specified port is unavailable.

- Executors and Dynamic Ports

Executors start/stop dynamically and may request random ports.

If you must prevent Spark from using random ephemeral ports, use the following settings:

--conf spark.driver.port=21800
--conf spark.blockManager.driver.port=21750
--conf spark.blockManager.port=21700
--conf spark.executor.port=21810
--conf spark.port.maxRetries=0

These settings can be applied at the job level or in the Spark configuration file.

Note: If a port is already used by a running job, a new job may fail due to a port conflict.

If you found this response assisted with your query, please take a moment to log in and click on KUDOS 🙂 & ”Accept as Solution" below this post.

Thank you.

avatar

Hi @Babasaheb ,
Thanks for your time, I tried your suggestion, but I'm still facing the same issue. Here’s more information: the error occurs exactly after 2 hours and ~15 minutes of runtime every time. The YARN scheduler exits with an error.
25/02/07 14:55:32 WARN TransportChannelHandler: Exception in connection from /1X1.XX.2XX.1XX:58360

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:254)

at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)

at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)

at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)

at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.lang.Thread.run(Thread.java:750)

25/02/07 14:55:32 WARN TransportChannelHandler: Exception in connection from /1X1.XX.2XX.1XX:58388

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:254)

at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)

at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)

at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)

at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.lang.Thread.run(Thread.java:750)

25/02/07 14:55:32 WARN TransportChannelHandler: Exception in connection from /1X1.XX.2XX.1XX:58382

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)

at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)

at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:254)

at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)

at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)

at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)

at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)

at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.lang.Thread.run(Thread.java:750)

25/02/07 14:55:32 WARN TransportChannelHandler: Exception in connection from /1X1.XX.2XX.1XX:40712

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

at sun.nio.ch.IOUtil.read(IOUtil.java:192)


25/02/07 14:55:33 ERROR TransportClient: Failed to send RPC RPC 4871454967371228059 to /1X1.XX.2XX.1XX:58360: io.netty.channel.StacklessClosedChannelException

io.netty.channel.StacklessClosedChannelException

at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)

25/02/07 14:55:33 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(Map(),Map(),Map(),Set()) to AM was unsuccessful

java.io.IOException: Failed to send RPC RPC 4871454967371228059 to /1X1.XX.2XX.1XX:58360: io.netty.channel.StacklessClosedChannelException

at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:395)

at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:372)

at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)

at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)

at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)

at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)

at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)

at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)

at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:999)

at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:860)

at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)

at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:877)

at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)

at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)

at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)

at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)

at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)

at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)

at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

at java.lang.Thread.run(Thread.java:750)

Caused by: io.netty.channel.StacklessClosedChannelException

at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)

25/02/07 14:55:33 ERROR Utils: Uncaught exception in thread YARN application state monitor

org.apache.spark.SparkException: Exception thrown in awaitResult:

at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:56)

at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:310)

at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)

at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:847)

at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:114)

at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:178)

at org.apache.spark.scheduler.TaskSchedulerImpl.$anonfun$stop$2(TaskSchedulerImpl.scala:992)

at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1375)

at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:992)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$stop$4(DAGScheduler.scala:2976)

at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1375)

at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2976)

at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2263)

at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1375)

at org.apache.spark.SparkContext.stop(SparkContext.scala:2263)

at org.apache.spark.SparkContext.stop(SparkContext.scala:2216)

at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:125)

Caused by: java.io.IOException: Failed to send RPC RPC 4871454967371228059 to /1X2.XX.2X5.XX7:58360: io.netty.channel.StacklessClosedChannelException

at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:395)

at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:372)

at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)

at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)

at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)

at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)

at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)

at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)

at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:999)

at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:860)

at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)

at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:877)

at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)


at java.lang.Thread.run(Thread.java:750)

Caused by: io.netty.channel.StacklessClosedChannelException

at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)

25/02/07 14:55:33 INFO CheckpointFileManager: Renamed temp file hdfs://ciXXXXf01:9000/user/cluster_pred/c458cbd3-8c00-4f10-8cce-c33377b74582/offsets/.2036.5b6dfa8e-882c-4ff1-9ae3-1c114319c719.tmp to hdfs://cichf1:9000/user/cluster_pred/c458cbd3-8c00-4f10-8cce-c33377b74582/offsets/2036

 

avatar

Hi everyone Could anyone in the community help identify the root cause in spark stream? 




@thoufeeq1218 wrote:
ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state SUCCEEDED! Check the YARN application logs for more details.