<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark job fails in cluster mode. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58803#M66510</link>
    <description>&lt;P&gt;Two points:&lt;/P&gt;&lt;P&gt;1) in cluster mode, you should use "--conf spark.driver.extraJavaOptions=" instead of "--driver-java-options"&lt;/P&gt;&lt;P&gt;2) you only provide application.conf in --file list, there's no log4.properties. So either you have this log4.properties distributed on each YARN node, or you should add this log4.properties file to --file list, and reference it with "-Dlog4j.configuration=./log4.properties"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For cluster mode, the full command should look like the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf,/home/abhig/log4.propertie \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=./log4.properties" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=./application.conf -Dlog4j.configuration=./log4.properties" \
/loca/project/gateway/mypgm.jar&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 13 Aug 2017 11:04:44 GMT</pubDate>
    <dc:creator>Yuexin Zhang</dc:creator>
    <dc:date>2017-08-13T11:04:44Z</dc:date>
    <item>
      <title>Spark job fails in cluster mode.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58772#M66509</link>
      <description>&lt;P&gt;Hi All&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been trying to submit below spark job in cluster mode through a bash shell.&lt;/P&gt;&lt;P&gt;Client mode submit works perfectly fine. But when i switch to cluster mode, this fails with error, no app file present.&lt;/P&gt;&lt;P&gt;App file refers to missing application.conf.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;spark-submit \&lt;BR /&gt;--master yarn \&lt;BR /&gt;--deploy-mode cluster \&lt;BR /&gt;--class myCLASS \&lt;BR /&gt;--properties-file /home/abhig/spark.conf \&lt;BR /&gt;--files /home/abhig/application.conf \&lt;BR /&gt;--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;-Dlog4j.configuration=/home/abhig/log4.properties" \&lt;BR /&gt;--driver-java-options "-Dconfig.file=&lt;SPAN&gt;/home/abhig/application.conf&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;-Dlog4j.configuration=/home/abhig/log4.properties" \&lt;BR /&gt;/loca/project/gateway/mypgm.jar&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I followed the link below on similar post&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149&amp;nbsp;" target="_blank"&gt;https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149&amp;nbsp;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;solution mentioned&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is still not clear.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I even tried&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;--files $CONFIG_FILE#application.conf&lt;/P&gt;&lt;P&gt;Still it doesn't work.&lt;/P&gt;&lt;P&gt;Any help will be appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;AB&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:05:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58772#M66509</guid>
      <dc:creator>ABaaya</dc:creator>
      <dc:date>2022-09-16T12:05:06Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job fails in cluster mode.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58803#M66510</link>
      <description>&lt;P&gt;Two points:&lt;/P&gt;&lt;P&gt;1) in cluster mode, you should use "--conf spark.driver.extraJavaOptions=" instead of "--driver-java-options"&lt;/P&gt;&lt;P&gt;2) you only provide application.conf in --file list, there's no log4.properties. So either you have this log4.properties distributed on each YARN node, or you should add this log4.properties file to --file list, and reference it with "-Dlog4j.configuration=./log4.properties"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For cluster mode, the full command should look like the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf,/home/abhig/log4.propertie \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=./log4.properties" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=./application.conf -Dlog4j.configuration=./log4.properties" \
/loca/project/gateway/mypgm.jar&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Aug 2017 11:04:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58803#M66510</guid>
      <dc:creator>Yuexin Zhang</dc:creator>
      <dc:date>2017-08-13T11:04:44Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job fails in cluster mode.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58832#M66511</link>
      <description>&lt;P&gt;Hi All&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/15085"&gt;@Yuexin Zhang&lt;/a&gt;&amp;nbsp;for the response.&lt;/P&gt;&lt;P&gt;I figured out the solution for this.&lt;/P&gt;&lt;P&gt;Below is the actual submit which worked for me.&lt;/P&gt;&lt;P&gt;The catch here is that when we submit in cluster mode, it uploads the file to a staging dir on hdfs.&lt;/P&gt;&lt;P&gt;Now the path and name of the file is different on hdfs then what it expects in the program.&lt;/P&gt;&lt;P&gt;To make that file available in the program, u have to make an alias for that file with '#' like mentioned below. (thats the only trick).&lt;/P&gt;&lt;P&gt;Now everywhere, u need to refer to that file, just mention that alias on spark submit command.&lt;/P&gt;&lt;P&gt;I mentioned the complete walkthrough and how to reach the solution in below links i referred to.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Issue also discussed here - &lt;A href="https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149" target="_blank"&gt;https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149&lt;/A&gt; - (Didn't actually helped me resolved, so i posted it separately)&lt;/P&gt;&lt;P&gt;Section "Important notes" in&amp;nbsp;&lt;A href="http://spark.apache.org/docs/latest/running-on-yarn.html" target="_blank"&gt;http://spark.apache.org/docs/latest/running-on-yarn.html&lt;/A&gt; ( Kinda have to read between the lines)&lt;/P&gt;&lt;P&gt;Blog explaining the reason - &lt;A href="http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html" target="_blank"&gt;http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html&lt;/A&gt; (Nice blog &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf#application.conf,/home/abhig/log4.properties#log4j \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=log4j" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=application.conf -Dlog4j.configuration=log4j" \
/local/project/gateway/mypgm.jar&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps the next person facing similar issue!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Aug 2017 15:00:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/58832#M66511</guid>
      <dc:creator>ABaaya</dc:creator>
      <dc:date>2017-08-14T15:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job fails in cluster mode.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/313783#M66512</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19259"&gt;@ABaaya&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Could you help me with a similar issue wherein we have already uploaded the file on hdfs which is being accessed in the code? Could you suggest what should the spark-submit command look like for a case like this to run in the cluster mode?&lt;/P&gt;&lt;P&gt;I have attached the error snippet and the code snippet where we are trying to access the file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kindly suggest.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Here's the error snippet" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/30749iA68F8FDD5385AD5E/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot from 2021-03-27 18-01-37.png" alt="Here's the error snippet" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Here's the error snippet&lt;/span&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Here's the code snippet" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/30750iC026A7A2F0417069/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot from 2021-03-27 18-02-09.png" alt="Here's the code snippet" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Here's the code snippet&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 27 Mar 2021 12:34:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/313783#M66512</guid>
      <dc:creator>ishika</dc:creator>
      <dc:date>2021-03-27T12:34:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark job fails in cluster mode.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/313791#M66513</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/87079"&gt;@ishika&lt;/a&gt;&amp;nbsp;as this is an older post, you would have a better chance of receiving a resolution by&lt;A href="“https://community.cloudera.com/t5/forums/postpage/board-id/Questions”" target="_blank"&gt; starting a new thread&lt;/A&gt;. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Mar 2021 07:36:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-job-fails-in-cluster-mode/m-p/313791#M66513</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2021-03-29T07:36:28Z</dc:date>
    </item>
  </channel>
</rss>

