Support Questions

Find answers, ask questions, and share your expertise

How to configute spark.network.timeout for SPARK on AMBARI

avatar
Rising Star

I'm running Spark and my app suddenly dead. I check log and find this problem is

17/08/15 12:29:40 ERROR TransportChannelHandler: Connection to /192.168.xx.109:44271 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.
17/08/15 12:29:40 WARN NettyRpcEndpointRef: Error sending message [message = RetrieveSparkProps] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:172)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:67)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:157)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:259)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
	at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
	at scala.concurrent.Await$.result(package.scala:107)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	... 12 more
17/08/15 12:29:43 ERROR TransportClient: Failed to send RPC 8631131244922754830 to hdp05.xxx.local/192.168.xx.109:44271: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException


It mean spark.network.timeout is configure by default (120s) https://spark.apache.org/docs/1.6.3/configuration.html#networking

So I want to increase spark.network.timeout = 800s (higher value than default). I can not find this line on Ambari UI, so I added it to : Spark > Configs > Custom spark-defaults > Add Property.

I see it create and add this configure to spark-defaults.conf


But when I running Spark app, I still have this ERROR

ERROR TransportChannelHandler: Connection to /192.168.xx.109:44271 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.

It seem this config spark.network.timeout = 800s is not apply to Spark for running.

So anyone have the same problem, anyone have solution for that please support me.

Thanks

1 REPLY 1

avatar
Contributor

must be in "ms", so try 800000.