We have a two node cluster. HDF is installed in one cluster and Spark on the other.
In a single node cluster I was able to trigger Spark from Nifi using 'ExecuteStreamCommand' processor using a Spark-submit command put in a shell script.
Can you please let me know the guidelines to trigger Spark from nifi in a multinode cluster for the above mentioned scenario.
Sure. Make sure the Spark cli dependencies are available on every node, i.e. you are able to submit your spark job from any node in the NiFi cluster.
Next, assuming you'd like to submit the job only once within a cluster, configure ExecuteStreamCommand by going in its Scheduling tab and selecting On Primary Node in the strategy dropdown. This will ensure it is a cluster-wide singleton. Note that you can't pin the primary node for failover reasons, e.g. this is a role automatically voted by a cluster and may change through its lifecycle if there's a recovery event, etc.