Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do we need install all HDP's Services Client in all node?

avatar
Explorer

We want to deploy HDP 3.1.5 in production environment
We have 3 server for masternode and 6 server for workernode
And we have plan component layout across 9 nodes above but we want to make sure where we need to place the service-client below

 

1. Yarn clients

  • First we've plan to install this to 9 nodes, does it okay or just install to 3 master nodes? Because as far as we know, yarn is needed for all nodes include resource managers and node managers
  • Or is it just needed for launch yarn apps or anything else

 

2. Mapreduce2 clients

  • Same as above, we plan to install it to 9 nodes because it required for mapreduce jobs
    Do we need to install across 9 nodes?

 

3. Hive clients

  • We've plan to install it to 3 master nodes, or we just need to install it to a master node?
    Is it just only needed for submit hive apps from beeline (cli)?

 

4. infra solr clients

  • We just plan to install it to 9 nodes and we dont know enough to know how this client works

 

5. Kerberos clients

  • Does all nodes need kerberos clients because it automatically installed across all nodes when we deploy in development environment

 

6. Oozie clients

  • Same as infra solr clients point, 9 nodes (plan)

 

7. Pig Clients

  • We've plan to install it to only 3 master node, is it related to run pig via cli or submit pig applications?

 

8. Spark2 clients

  • We've plan to install it to a master node because we just want limit it where only one server that can submit spark apps
  • But in development environment, it installed in all nodes, how do uninstall the spark2 client in worker nodes?

 

9. Sqoop clients

  • Same point as number 9, only to a master node

 

10. Tez client

  • we plan to install it to 9 nodes but we dont have any info how this client works
1 ACCEPTED SOLUTION

avatar
Master Mentor

@zetta4ever 

In a Hadoop cluster, three types of nodes exist Master, Worker and edge nodes. The distinction of roles helps maintain efficiency.

Master nodes control which nodes perform which tasks and what processes run on what nodes. The majority of work is assigned to worker nodes. Worker node store most of the data and perform most of the calculations Edge nodes aka gateway facilitate communications from end users to master and worker nodes.


The 3 masternodes should have the Namenode[Active & Standby],YARN [Active & Standby], Zookeeper Quorum [3 masters] and the other component you intend to install and on the 6 worker node aka slave nodes you will install the Nodemanager,Datanodes and the all the clients.
There is no need to install the client on the master nodes,


Some nodes have important tasks, which may impact performance if interrupted. Edge nodes allow end-users to contact worker nodes when necessary, providing a network interface for the cluster without leaving the entire cluster open to communication. That limitation improves reliability and security. As work is evenly distributed between work nodes, the edge node’s role helps avoid data skewing and performance issues.

See my document on edge node https://community.cloudera.com/t5/Support-Questions/Edge-node-or-utility-node-packages/td-p/202164#

Hope that helps

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@zetta4ever 

In a Hadoop cluster, three types of nodes exist Master, Worker and edge nodes. The distinction of roles helps maintain efficiency.

Master nodes control which nodes perform which tasks and what processes run on what nodes. The majority of work is assigned to worker nodes. Worker node store most of the data and perform most of the calculations Edge nodes aka gateway facilitate communications from end users to master and worker nodes.


The 3 masternodes should have the Namenode[Active & Standby],YARN [Active & Standby], Zookeeper Quorum [3 masters] and the other component you intend to install and on the 6 worker node aka slave nodes you will install the Nodemanager,Datanodes and the all the clients.
There is no need to install the client on the master nodes,


Some nodes have important tasks, which may impact performance if interrupted. Edge nodes allow end-users to contact worker nodes when necessary, providing a network interface for the cluster without leaving the entire cluster open to communication. That limitation improves reliability and security. As work is evenly distributed between work nodes, the edge node’s role helps avoid data skewing and performance issues.

See my document on edge node https://community.cloudera.com/t5/Support-Questions/Edge-node-or-utility-node-packages/td-p/202164#

Hope that helps

avatar
Explorer

If we want to limit interaction of hdp/hadoop developers/data analyst or scientist, does it mean we don't need to install client in all workernodes?

And we have ever found that for special case, sqoop and oozie client, are needed to be installed in all nodes include master-worker nodes, Is it related to how sqoop and oozie works?