I want to know that where we can run the job without spark master server or not?
As we can integrate it with yarn resource manager so in this case what will be the use spark master?
could you give the exact flow diagram of job submission using spark master and yarn?
Spark master in YARN mode is the YARN itself, master server is relevant in standalone or Media operation.
There are two deploy modes that can be used to launch Spark applications on YARN. In
cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In
client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master’s address is specified in the
--master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the
Diagram of execution https://spark.apache.org/docs/latest/img/cluster-overview.png
Thanks a lot for your explanation.
So ideally, when we want to run a spark cluster using yarn then we no need to configure spark master?
@satya gaurav no, if you deploy with HDP, everything you need will be deployed by the platform, take a look here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/install-spa.... I highly recommend reading through our comprehensive guides for Spark http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/ch_introduc... and Zeppelin, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_zeppelin-component-guide/content/index.ht... if you have any other questions, open an HCC thread or file a support ticket. If my answer was at all helpful, please consider accepting the answer as best. Thanks