Support Questions

Find answers, ask questions, and share your expertise

Spark "Master Node" and "Worker Node" in Cloudera?

avatar
Contributor

I'm using a tool in which I have to point out the master node (driver node) of the Cloudera Spark Cluster (spark :// <some-spark-master> : 7077). Also as I learned, Spark has "Master Node" (Driver Node) and "Worker Nodes".

So I decided to go to the Cloudera Web Manager and checked the Configuration Tab of the Spark service, but all I found are "Gateway instance" and "History Server instance". Where are the "Driver instance" and "Worker instance"? I can't add these two instances in the "Add Role Instances" too

quangbilly79_0-1675217417485.png

 

My guess is that it's in Yarn service configuration, but I can't find anything related to "Master"/"Driver" or "Worker" either.

quangbilly79_1-1675217551253.png

 

So what is the link to "Spark Master" that ends with 7077? I can't find it anywhere in the Configuration tab

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @quangbilly79 

 

Thanks for using Cloudera Community. The "Spark Master" refers to the Resource Manager responsible for allocating resources. Since you are using YARN, Your Team needs to use "--master yarn". The usage of "--master spark://<IP Address>:7077" is for Spark Standalone Cluster, which isn't the Case for your team.

 

To your Observation concerning the "Driver Instance" & "Worker Instance" being added via "Add Role Instance", there is no such Option as YARN is the Resource Manager, which shall allocate the resources for Spark Driver & Executors.

 

Review [1] for the usage of "--master" as well. Hope the above answers your Team's queries.

 

Regards, Smarak

 

[1] https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-...

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hello @quangbilly79 

 

Thanks for using Cloudera Community. The "Spark Master" refers to the Resource Manager responsible for allocating resources. Since you are using YARN, Your Team needs to use "--master yarn". The usage of "--master spark://<IP Address>:7077" is for Spark Standalone Cluster, which isn't the Case for your team.

 

To your Observation concerning the "Driver Instance" & "Worker Instance" being added via "Add Role Instance", there is no such Option as YARN is the Resource Manager, which shall allocate the resources for Spark Driver & Executors.

 

Review [1] for the usage of "--master" as well. Hope the above answers your Team's queries.

 

Regards, Smarak

 

[1] https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-...