Support Questions
Find answers, ask questions, and share your expertise

Trigger Spark from HDF

Highlighted

Trigger Spark from HDF

We have a two node cluster. HDF is installed in one cluster and Spark on the other.

In a single node cluster I was able to trigger Spark from Nifi using 'ExecuteStreamCommand' processor using a Spark-submit command put in a shell script.

Can you please let me know the guidelines to trigger Spark from nifi in a multinode cluster for the above mentioned scenario.

4 REPLIES 4

Re: Trigger Spark from HDF

Sure. Make sure the Spark cli dependencies are available on every node, i.e. you are able to submit your spark job from any node in the NiFi cluster.

Next, assuming you'd like to submit the job only once within a cluster, configure ExecuteStreamCommand by going in its Scheduling tab and selecting On Primary Node in the strategy dropdown. This will ensure it is a cluster-wide singleton. Note that you can't pin the primary node for failover reasons, e.g. this is a role automatically voted by a cluster and may change through its lifecycle if there's a recovery event, etc.

Highlighted

Re: Trigger Spark from HDF

Thanks @Andrew Grande. We are now planning to use the job server Livy (http://livy.io/). Can anyone please guide me through this. I tried searching for some documentation but failed to find out a useful one.

Highlighted

Re: Trigger Spark from HDF

Super Guru

Here is a good article on calling Livy from REST with CURL examples. Very easy to move those to NIFI

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-livy-rest-interface

Highlighted

Re: Trigger Spark from HDF

Super Guru

What I like to do is run Spark Streaming and not batch. You can call that via Site-To-Site or Kafka. Then it's always ready to run.