I've my spark job and deployed on hadoop cluster,
In my case i've more than one edge nodes pointing to the same hadoop cluster, now my requirement is if there is some issues for edge node 1 my spark job should get triggered from another available edge node.
What is the best way to do it?
@RAUI deploy using --master --deploy-mode cluster instead of --deploy-mode client. This way your applications won't have any dependencies running on the edge nodes. As far as best practices, it would be best that the edge nodes are managed by ambari so that the client configurations are always up to date. This way you avoid any configuration related issues in case you need to deploy applications from different edge nodes.
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
@Felix Albani Thanks for your answer,
In my scenario we've 3 edge nodes. The same spark code jar has been deployed into all three edge nodes. let's say there is some maintenance work going on node 1 and my spark jar is not available. Now my job should get triggered from other available node2 or node3 from job schedulers.
Currently we've deployed on cluster mode only but my confusion is, How the code should get triggered from next available edge node, what is the best approach to schedule spark job from multiple edge nodes.
What are you using to trigger the application from edge nodes as of now? In this case I would suggest you review oozie as a good alternative, if you like to automate/schedule execution of spark application in the cluster oozie provides a spark action and shell action you could use to leverage the execution of the spark code. Oozie also supports HA, so that will have you covered in case one nodes is down for some time.
If jobs get triggered by external scheduler, then think about HA load-balancer configuration for edge nodes. so you can provide balancer name/IP in scheduler and then high availability will automatically get managed by load-balancer