Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark "Master Node" and "Worker Node" in Cloudera?

avatar
Contributor

I'm using a tool in which I have to point out the master node (driver node) of the Cloudera Spark Cluster (spark :// <some-spark-master> : 7077). Also as I learned, Spark has "Master Node" (Driver Node) and "Worker Nodes".

So I decided to go to the Cloudera Web Manager and checked the Configuration Tab of the Spark service, but all I found are "Gateway instance" and "History Server instance". Where are the "Driver instance" and "Worker instance"? I can't add these two instances in the "Add Role Instances" too

quangbilly79_0-1675217417485.png

 

My guess is that it's in Yarn service configuration, but I can't find anything related to "Master"/"Driver" or "Worker" either.

quangbilly79_1-1675217551253.png

 

So what is the link to "Spark Master" that ends with 7077? I can't find it anywhere in the Configuration tab

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @quangbilly79 

 

Thanks for using Cloudera Community. The "Spark Master" refers to the Resource Manager responsible for allocating resources. Since you are using YARN, Your Team needs to use "--master yarn". The usage of "--master spark://<IP Address>:7077" is for Spark Standalone Cluster, which isn't the Case for your team.

 

To your Observation concerning the "Driver Instance" & "Worker Instance" being added via "Add Role Instance", there is no such Option as YARN is the Resource Manager, which shall allocate the resources for Spark Driver & Executors.

 

Review [1] for the usage of "--master" as well. Hope the above answers your Team's queries.

 

Regards, Smarak

 

[1] https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-...

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hello @quangbilly79 

 

Thanks for using Cloudera Community. The "Spark Master" refers to the Resource Manager responsible for allocating resources. Since you are using YARN, Your Team needs to use "--master yarn". The usage of "--master spark://<IP Address>:7077" is for Spark Standalone Cluster, which isn't the Case for your team.

 

To your Observation concerning the "Driver Instance" & "Worker Instance" being added via "Add Role Instance", there is no such Option as YARN is the Resource Manager, which shall allocate the resources for Spark Driver & Executors.

 

Review [1] for the usage of "--master" as well. Hope the above answers your Team's queries.

 

Regards, Smarak

 

[1] https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-...