Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running a Spark Job with NiFi using Execute Process

Solved Go to solution

Running a Spark Job with NiFi using Execute Process

Expert Contributor

Hi I do know there are a number of threads posted about how to run a spark job from NiFi, but most of them explain a setup on HDP.

I am using windows. I have spark and NiFi locally installed.

Can anyone explain how can I configure the Execute Process to run the following command (which I run in the command line and it works)

spark-submit2.cmd --class "SimpleApp" --master local[4] file:///C:/Simple_Project/target/scala-2.10/simple-project_2.10-1.0.jar

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Running a Spark Job with NiFi using Execute Process

Guru

@Arsalan Siddiqi

You should just be able to bring up the execute processor and configure the command you have there as the command to execute. Just make sure you give it the full path the the spark-submit2.cmd executable (e.g. /usr/bin/spark-submit). As long as the file and path you are referencing is on the same machine as where Nifi is running (assuming it is only 1 box and is not clustered), and Spark client is present and configured correctly, the processor should just kick off the spark-submit. Make sure you change the scheduling to be something more than 0 seconds. Otherwise, you will quickly fill up the cluster where the job is being submitted with duplicate jobs. You can also set it to be CRON scheduled.

2 REPLIES 2

Re: Running a Spark Job with NiFi using Execute Process

Guru

@Arsalan Siddiqi

You should just be able to bring up the execute processor and configure the command you have there as the command to execute. Just make sure you give it the full path the the spark-submit2.cmd executable (e.g. /usr/bin/spark-submit). As long as the file and path you are referencing is on the same machine as where Nifi is running (assuming it is only 1 box and is not clustered), and Spark client is present and configured correctly, the processor should just kick off the spark-submit. Make sure you change the scheduling to be something more than 0 seconds. Otherwise, you will quickly fill up the cluster where the job is being submitted with duplicate jobs. You can also set it to be CRON scheduled.

Re: Running a Spark Job with NiFi using Execute Process

Super Collaborator

Hi @Arsalan Siddiqi,

Alternate to Above response, you may take the help of Livy where you don't need to worry about configuring the NiFi Environment to include spark specific configuration, as Livy take REST requests, this works with same Execute process or ExecuteStreamCommand Process, a curl command need to be issued. this is very handy when your NiFi and Spark is running in different servers.

Please refer the Livy Documentation on that front

Don't have an account?
Coming from Hortonworks? Activate your account here