Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are worker and Edge nodes?

avatar

What is worker node & edge nodes?

Why are w using these nodes?

What is role of these nodes?

What role does it play when a job is executed?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Bala Vignesh N V

Edge node refers to a dedicated node (machine) where no Hadoop services are running, and where you install only so-called Hadoop clients (hdfs, Hive, HBase etc. clients). In your case your BI tool will also play a role of a Hadoop client. A client means that only respective component client libraries and scripts will be installed, together with its config files. If you change config through Ambari, then Ambari will automatically refresh config files on the edge node as well. In a small, test cluster without an edge node you can select one node where Hadoop services are running (for example, a master node) to play a role of your edge node. (In a large cluster with many users there are usually multiple edge nodes.) As the "edge node folder" you can use any folder on the edge node you decide to use. Usually we execute Sqoop, hdfs, Oozie, ...etc commands from an edgenode. Edgenode is a client-facing machine that has all client tools to operate on a cluster. It is not a good idea to use NameNode or other HDP components as your edgenode. Typically you'd want a separate node designated just for client access.

Worker nodes make up the majority of virtual machines and perform the job of storing the data and running computations. Worker nodes usually runs both a DataNode and NodeManager ..etc kind of services.

https://community.hortonworks.com/questions/87884/which-node-to-use.html

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Bala Vignesh N V

Edge node refers to a dedicated node (machine) where no Hadoop services are running, and where you install only so-called Hadoop clients (hdfs, Hive, HBase etc. clients). In your case your BI tool will also play a role of a Hadoop client. A client means that only respective component client libraries and scripts will be installed, together with its config files. If you change config through Ambari, then Ambari will automatically refresh config files on the edge node as well. In a small, test cluster without an edge node you can select one node where Hadoop services are running (for example, a master node) to play a role of your edge node. (In a large cluster with many users there are usually multiple edge nodes.) As the "edge node folder" you can use any folder on the edge node you decide to use. Usually we execute Sqoop, hdfs, Oozie, ...etc commands from an edgenode. Edgenode is a client-facing machine that has all client tools to operate on a cluster. It is not a good idea to use NameNode or other HDP components as your edgenode. Typically you'd want a separate node designated just for client access.

Worker nodes make up the majority of virtual machines and perform the job of storing the data and running computations. Worker nodes usually runs both a DataNode and NodeManager ..etc kind of services.

https://community.hortonworks.com/questions/87884/which-node-to-use.html

avatar

Thanks @Jay SenSharma

Does worker node will also be available as a dedicated node like edge node? Also when a job is executed does all the intermediate staging data will be stored in worker node? How the worker node access data from data node? Forgive me if these are lame questions. Im trying to understand about worker nodes.

avatar

@Bala Vignesh N V Your worker node is same as your data node. Worker node are those who actually does the work in the cluster.