I have a CDH cluster with below configuration. Spark gateway is installed on all machines. Each node has 22 cores
2 name nodes
3 Kafka nodes
4 data nodes
1 node to run mysql content store
I am trying to find out how many worker/slave nodes are assigned to run spark applications in the cluster. When I see the hadoop UI it shows total vcores assinged as 88. Does that mean only 4 data nodes are configured as worker nodes ? Where and how can I see what all worker/slave nodes are configured.
We are using YARN as resource manager.
In your setup, your 4 data nodes are your worker nodes. HDFS i.e. where the data are stored is on the data nodes and because of data locality, this is where your 'compute' or 'worker' nodes run too.
If you log into Cloudera Manager and click 'Hosts' at the top and then 'Roles' on the dropdown you will see what services are running on what nodes.
I hope that helps.