question Re: Running a Spark Job with NiFi using Execute Process in Archives of Support Questions (Read Only)

Running a Spark Job with NiFi using Execute Process

arsalan_siddiqi — Mon, 15 May 2017 01:55:35 GMT

Hi I do know there are a number of threads posted about how to run a spark job from NiFi, but most of them explain a setup on HDP.

I am using windows. I have spark and NiFi locally installed.

Can anyone explain how can I configure the Execute Process to run the following command (which I run in the command line and it works)

spark-submit2.cmd --class "SimpleApp" --master local[4] file:///C:/Simple_Project/target/scala-2.10/simple-project_2.10-1.0.jar

Re: Running a Spark Job with NiFi using Execute Process

vvaks — Tue, 16 May 2017 03:44:55 GMT

@Arsalan Siddiqi

You should just be able to bring up the execute processor and configure the command you have there as the command to execute. Just make sure you give it the full path the the spark-submit2.cmd executable (e.g. /usr/bin/spark-submit). As long as the file and path you are referencing is on the same machine as where Nifi is running (assuming it is only 1 box and is not clustered), and Spark client is present and configured correctly, the processor should just kick off the spark-submit. Make sure you change the scheduling to be something more than 0 seconds. Otherwise, you will quickly fill up the cluster where the job is being submitted with duplicate jobs. You can also set it to be CRON scheduled.

Re: Running a Spark Job with NiFi using Execute Process

bkosaraju — Tue, 16 May 2017 11:44:08 GMT

Hi @Arsalan Siddiqi,

Alternate to Above response, you may take the help of Livy where you don't need to worry about configuring the NiFi Environment to include spark specific configuration, as Livy take REST requests, this works with same Execute process or ExecuteStreamCommand Process, a curl command need to be issued. this is very handy when your NiFi and Spark is running in different servers.

Please refer the Livy Documentation on that front