<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Spark jobs failing in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295898#M218019</link>
    <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running Spark jobs via yarn and its failing with the below error, any help / pointer to fix is much appericated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Shell output: main : command provided 1
main : run as user is TEST1
main : requested yarn user is TEST1
Writing to tmp file /data/8/yarn/nm/nmPrivate/application_1587389136999_0013/container_e56_1587389136999_0013_01_000477/container_e56_1587389136999_0013_01_000477.pid.tmp
Writing to cgroup task files...


Container exited with a non-zero exit code 1

org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
	at scala.util.Try$.apply(Try.scala:192)
	at scala.util.Failure.recover(Try.scala:216)
	at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
	at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
	at scala.concurrent.Promise$class.complete(Promise.scala:55)
	at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
	at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
	at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
	at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
	at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
	at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
	... 8 more&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Amn&lt;/P&gt;</description>
    <pubDate>Thu, 14 May 2020 04:42:55 GMT</pubDate>
    <dc:creator>Amn_468</dc:creator>
    <dc:date>2020-05-14T04:42:55Z</dc:date>
    <item>
      <title>Spark jobs failing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295898#M218019</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are running Spark jobs via yarn and its failing with the below error, any help / pointer to fix is much appericated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Shell output: main : command provided 1
main : run as user is TEST1
main : requested yarn user is TEST1
Writing to tmp file /data/8/yarn/nm/nmPrivate/application_1587389136999_0013/container_e56_1587389136999_0013_01_000477/container_e56_1587389136999_0013_01_000477.pid.tmp
Writing to cgroup task files...


Container exited with a non-zero exit code 1

org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:216)
	at scala.util.Try$.apply(Try.scala:192)
	at scala.util.Failure.recover(Try.scala:216)
	at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
	at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:326)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at org.spark_project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
	at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:136)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
	at scala.concurrent.Promise$class.complete(Promise.scala:55)
	at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
	at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
	at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
	at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
	at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
	at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
	at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
	at org.apache.spark.rpc.netty.NettyRpcEnv.org$apache$spark$rpc$netty$NettyRpcEnv$$onFailure$1(NettyRpcEnv.scala:205)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:239)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds
	... 8 more&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Amn&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2020 04:42:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295898#M218019</guid>
      <dc:creator>Amn_468</dc:creator>
      <dc:date>2020-05-14T04:42:55Z</dc:date>
    </item>
    <item>
      <title>Re: Spark jobs failing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295903#M218021</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/32334"&gt;@Amn_468&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;To better assist you with this issue, it would be great if you could please help to provide the following additional information:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;1) Is this issue occurring for all jobs or only some jobs? If the issue has only started recently, does this coincide with any code or configuration changes in the job itself or configuration changes in the cluster?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2020 06:40:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295903#M218021</guid>
      <dc:creator>Madhur</dc:creator>
      <dc:date>2020-05-14T06:40:10Z</dc:date>
    </item>
    <item>
      <title>Re: Spark jobs failing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295911#M218028</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/76833"&gt;@Madhur&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is happening with all Spark jobs, there has been no changes in the code or cluster,&amp;nbsp; also the failure is random.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2020 08:34:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295911#M218028</guid>
      <dc:creator>Amn_468</dc:creator>
      <dc:date>2020-05-14T08:34:09Z</dc:date>
    </item>
    <item>
      <title>Re: Spark jobs failing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295921#M218034</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/32334"&gt;@Amn_468&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for replying back.&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;Kindly try Increasing spark.rpc.askTimeout from default 120 seconds to a higher value in Ambari UI -&amp;gt; Spark Configs -&amp;gt; spark2-defaults. Recommendation is to increase it to at least 480 seconds and restart the necessary services.possibly the Driver and Executer are not able to&amp;nbsp; get Heartbeat response in configured timeout. If you don’t want to do any cluster level change then you may try overriding this value in the job level.&lt;/P&gt;&lt;P class="p2"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;For example:&lt;SPAN class="Apple-converted-space"&gt;&amp;nbsp; &lt;/SPAN&gt;spark-submit by adding --conf spark.rpc.askTimeout=600s while submitting the job&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2020 11:29:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295921#M218034</guid>
      <dc:creator>Madhur</dc:creator>
      <dc:date>2020-05-14T11:29:57Z</dc:date>
    </item>
    <item>
      <title>Re: Spark jobs failing</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295925#M218038</link>
      <description>&lt;P&gt;Hi&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/76833"&gt;@Madhur&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate your assistance,&amp;nbsp; I am using CM, where would this setting be in CM for making the changes in Cluster level, and to confirm these values have to be passed in seconds ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you also provide steps / document outlining how to change this while submitting spark jobs.&lt;/P&gt;</description>
      <pubDate>Thu, 14 May 2020 13:02:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-jobs-failing/m-p/295925#M218038</guid>
      <dc:creator>Amn_468</dc:creator>
      <dc:date>2020-05-14T13:02:34Z</dc:date>
    </item>
  </channel>
</rss>

