<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Running a Spark Job with NiFi using Execute Process in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230193#M61124</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13785/arsalan-siddiqi.html" nodeid="13785"&gt;@Arsalan Siddiqi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You should just be able to bring up the execute processor and configure the command you have there as the command to execute. Just make sure you give it the full path the the spark-submit2.cmd executable (e.g. /usr/bin/spark-submit). As long as the file and path you are referencing is on the same machine as where Nifi is running (assuming it is only 1 box and is not clustered), and Spark client is present and configured correctly, the processor should just kick off the spark-submit. Make sure you change the scheduling to be something more than 0 seconds. Otherwise, you will quickly fill up the cluster where the job is being submitted with duplicate jobs. You can also set it to be CRON scheduled.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 16 May 2017 03:44:55 GMT</pubDate>
    <dc:creator>vvaks</dc:creator>
    <dc:date>2017-05-16T03:44:55Z</dc:date>
    <item>
      <title>Running a Spark Job with NiFi using Execute Process</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230192#M61123</link>
      <description>&lt;P&gt;Hi I do know there are a number of threads posted about how to run a spark job from NiFi, but most of them explain a setup on HDP. &lt;/P&gt;&lt;P&gt;I am using windows. I have spark and NiFi locally installed.&lt;/P&gt;&lt;P&gt;Can anyone explain how can I configure the Execute Process to run the following command (which I run in the command line and it works)&lt;/P&gt;&lt;P&gt;spark-submit2.cmd --class "SimpleApp" --master local[4] file:///C:/Simple_Project/target/scala-2.10/simple-project_2.10-1.0.jar&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 01:55:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230192#M61123</guid>
      <dc:creator>arsalan_siddiqi</dc:creator>
      <dc:date>2017-05-15T01:55:35Z</dc:date>
    </item>
    <item>
      <title>Re: Running a Spark Job with NiFi using Execute Process</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230193#M61124</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13785/arsalan-siddiqi.html" nodeid="13785"&gt;@Arsalan Siddiqi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You should just be able to bring up the execute processor and configure the command you have there as the command to execute. Just make sure you give it the full path the the spark-submit2.cmd executable (e.g. /usr/bin/spark-submit). As long as the file and path you are referencing is on the same machine as where Nifi is running (assuming it is only 1 box and is not clustered), and Spark client is present and configured correctly, the processor should just kick off the spark-submit. Make sure you change the scheduling to be something more than 0 seconds. Otherwise, you will quickly fill up the cluster where the job is being submitted with duplicate jobs. You can also set it to be CRON scheduled.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 May 2017 03:44:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230193#M61124</guid>
      <dc:creator>vvaks</dc:creator>
      <dc:date>2017-05-16T03:44:55Z</dc:date>
    </item>
    <item>
      <title>Re: Running a Spark Job with NiFi using Execute Process</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230194#M61125</link>
      <description>&lt;P&gt;Hi &lt;A href="#"&gt;@Arsalan Siddiqi&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Alternate to Above response, you may take the help of Livy where you don't need to worry about configuring the NiFi Environment to include spark specific configuration, as Livy take REST requests, this works with same Execute process or  ExecuteStreamCommand Process, a curl command need to be issued. this is very handy when your NiFi and Spark is running in different servers. &lt;/P&gt;&lt;P&gt;Please refer the &lt;A href="http://livy.io/quickstart.html"&gt;Livy Documentation&lt;/A&gt; on that front&lt;/P&gt;</description>
      <pubDate>Tue, 16 May 2017 11:44:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-a-Spark-Job-with-NiFi-using-Execute-Process/m-p/230194#M61125</guid>
      <dc:creator>bkosaraju</dc:creator>
      <dc:date>2017-05-16T11:44:08Z</dc:date>
    </item>
  </channel>
</rss>

