<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41453#M23577</link>
    <description>&lt;P&gt;This looks&amp;nbsp;weird. And can&amp;nbsp;you confirm that&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar&lt;/PRE&gt;&lt;P&gt;is still not present?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Spark keeps all JARs specified by --jars option in job's temp directory on each executor nodes [&lt;A href="http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management" target="_self"&gt;1&lt;/A&gt;]. There must be some sort of OS settings which&amp;nbsp;lead the deletion of existing phoenix jar from temp and when Spark Context is unable to find it at its usual location it tries to download it from the given location. However this should not happen until the temp directory is actively accessed by the job or process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can try bundling that JAR with your Spark JAR and then refer it in spark-submit.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I suspect, you will need again 20 odd days to test this workaround &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 30 May 2016 08:06:17 GMT</pubDate>
    <dc:creator>_Umesh</dc:creator>
    <dc:date>2016-05-30T08:06:17Z</dc:date>
    <item>
      <title>Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41450#M23576</link>
      <description>&lt;P&gt;CDH 5.5.1 installed with parcels, CentOS 6.7&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a spark streaming job which used Phoenix (jar &lt;FONT face="courier new,courier"&gt;phoenix-1.2.0-client.jar&lt;/FONT&gt;). After the job ran for a few days, it tried to reload the jar and got a &lt;FONT face="courier new,courier"&gt;FileNotFoundException&lt;/FONT&gt;.&lt;/P&gt;&lt;P&gt;Command used to start job&lt;/P&gt;&lt;PRE&gt;nohup spark-submit --master yarn --deploy-mode client --class com.myCompany.MyStreamProc --driver-class-path /opt/mycompany/my-spark.jar:/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-1.2.0-client.jar:... --jars /opt/mycompany/my-spark.jar,/opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-1.2.0-client.jar,... my-spark.jar&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;Log entry around &lt;FONT face="courier new,courier"&gt;FileNotFoundException&lt;/FONT&gt; in Driver Log&lt;/P&gt;&lt;PRE&gt;[INFO] 2016-05-28 15:28:00,052 org.apache.spark.scheduler.TaskSetManager logInfo - Starting task 69.0 in stage 27723.0 (TID 1692793, node3.mycompany.com, partition 69,NODE_LOCAL, 2231 bytes)
[INFO] 2016-05-28 15:28:00,205 org.apache.spark.storage.BlockManagerInfo logInfo - Added input-0-1464420480000 in memory on node1.mycompany.com:47601 (size: 15.0 KB, free: 302.0 MB)
[INFO] 2016-05-28 15:28:00,213 org.apache.spark.storage.BlockManagerInfo logInfo - Added input-0-1464420480000 in memory on node2.mycompany.com:42510 (size: 15.0 KB, free: 308.7 MB)
[INFO] 2016-05-28 15:28:00,351 org.apache.spark.scheduler.TaskSetManager logInfo - Starting task 70.0 in stage 27723.0 (TID 1692794, node2.mycompany.com, partition 70,NODE_LOCAL, 2231 bytes)
[WARN] 2016-05-28 15:28:00,391 org.apache.spark.scheduler.TaskSetManager logWarning - Lost task 69.0 in stage 27723.0 (TID 1692793, node2.mycompany.com): java.io.FileNotFoundException: http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
        at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:556)
        at org.apache.spark.util.Utils$.fetchFile(Utils.scala:356)
        at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405)
        at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397)
        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;&lt;P&gt;(Note: node3.mycompany.com = 192.168.88.28)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;According to Executor logs, when job started (2016-05-09) they downloaded &lt;FONT face="courier new,courier"&gt;&lt;A href="http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar" target="_blank"&gt;http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar&lt;/A&gt;&lt;/FONT&gt; successfully.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Seems to me somehow Spark wants to reload the jar, but it was missing. Any suggestion? Is the job running too long (nearly 20 days already)?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:22:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41450#M23576</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2022-09-16T10:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41453#M23577</link>
      <description>&lt;P&gt;This looks&amp;nbsp;weird. And can&amp;nbsp;you confirm that&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;http://192.168.88.28:55310/jars/phoenix-1.2.0-client.jar&lt;/PRE&gt;&lt;P&gt;is still not present?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Spark keeps all JARs specified by --jars option in job's temp directory on each executor nodes [&lt;A href="http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management" target="_self"&gt;1&lt;/A&gt;]. There must be some sort of OS settings which&amp;nbsp;lead the deletion of existing phoenix jar from temp and when Spark Context is unable to find it at its usual location it tries to download it from the given location. However this should not happen until the temp directory is actively accessed by the job or process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can try bundling that JAR with your Spark JAR and then refer it in spark-submit.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I suspect, you will need again 20 odd days to test this workaround &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2016 08:06:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41453#M23577</guid>
      <dc:creator>_Umesh</dc:creator>
      <dc:date>2016-05-30T08:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41455#M23578</link>
      <description>It is definitely not present. Actually, I forgot to say, the spark streaming job killed itself after the FileNotFoundException.&lt;BR /&gt;&lt;BR /&gt;Where is the job's temp directory? Or, where was it configured?</description>
      <pubDate>Mon, 30 May 2016 10:20:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41455#M23578</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2016-05-30T10:20:47Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41457#M23579</link>
      <description>&lt;P&gt;See the Environment tab of Job History UI and locate "&lt;SPAN&gt;spark.local.dir".&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Yes that is the expected behaviour&amp;nbsp;as JAR is required to the executors.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2016 10:48:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41457#M23579</guid>
      <dc:creator>_Umesh</dc:creator>
      <dc:date>2016-05-30T10:48:10Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41467#M23580</link>
      <description>&lt;P&gt;I can't found &lt;FONT face="courier new,courier"&gt;spark.local.dir&lt;/FONT&gt; in either Job History UI (got &lt;FONT face="courier new,courier"&gt;OutOfMemoryException&lt;/FONT&gt; and all job history gone after restart) or Application UI. However, according to documentation, &lt;FONT face="courier new,courier"&gt;spark.local.dir&lt;/FONT&gt; is &lt;FONT face="courier new,courier"&gt;/tmp&lt;/FONT&gt; by default, and the jar files are found in &lt;FONT face="courier new,courier"&gt;/tmp/spark-.../&lt;/FONT&gt; . So the &lt;FONT face="courier new,courier"&gt;FileNotFoundException&lt;/FONT&gt; is likely caused by housekeeping &lt;FONT face="courier new,courier"&gt;/tmp&lt;/FONT&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2016 02:19:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/41467#M23580</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2016-05-31T02:19:11Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/53214#M23581</link>
      <description>&lt;P&gt;Have you fixed this issue? I am sufferring the same issue in the spark streaming application on 1.6.2&lt;/P&gt;</description>
      <pubDate>Wed, 05 Apr 2017 06:16:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/53214#M23581</guid>
      <dc:creator>fairchild</dc:creator>
      <dc:date>2017-04-05T06:16:02Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Streaming: FileNotFoundException on files included in --jars after running a few days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/53249#M23582</link>
      <description>&lt;P&gt;The cause of my case was described in Message 4-5 of the thread. Here are some possible solutions&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;set &lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;spark.local.dir&lt;/FONT&gt;&lt;SPAN&gt; to somewhere else outside &lt;/SPAN&gt;&lt;FONT face="courier new,courier"&gt;/tmp&lt;/FONT&gt;&lt;SPAN&gt;&amp;nbsp;.&amp;nbsp;Refer to&amp;nbsp;&lt;A title="Spark Configuration" href="http://spark.apache.org/docs/latest/configuration.html#available-properties" target="_blank"&gt;Spark Configuration&lt;/A&gt;&amp;nbsp;for how to configure the value.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;disable housekeeping of &lt;FONT face="courier new,courier"&gt;/tmp/spark-...&lt;/FONT&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;periodic restart your spark streaming job&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 02:33:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Streaming-FileNotFoundException-on-files-included-in/m-p/53249#M23582</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2017-04-06T02:33:52Z</dc:date>
    </item>
  </channel>
</rss>

