Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Edge node do we really need in Production hortonworks clusters?

avatar
Expert Contributor

Hi Team, We are going to deploy HDP 2.3.4 for Big Env setup Can Some one

Please explain me the architecture of Edge node in hadoop .

I am able to find only the definition on the internet. I have some queries

1)What is edge node?

2) when and why do we need it ?

3) does every production cluster contain this edge node?

4) Does the edge node a part of the cluster (What advantages do we have if it is inside the cluster . Does it store any blocks of data in hdfs. any performance improvement?

5)Should the edge node be outside the cluster .

6) Please refer any docs where i can know about it. Preferably Hortonworks docs

1 ACCEPTED SOLUTION

avatar

@ripunjay godhani

1. Edge nodes are the interface between the Hadoop cluster and the outside network.

2. They’re also often used as staging areas for data being transferred into the Hadoop cluster. As such, Oozie, Pig, Sqoop, and management tools such as Hue and Ambari run well there.

3. Yes, it's always better.

4. Yes, it doesn't store any hdfs data, it used for accessing cluster and processing/accessing the data.

5. Yes, it's always better to be outside of secured VLAN.

Link might help you more.

View solution in original post

4 REPLIES 4

avatar

@ripunjay godhani

1. Edge nodes are the interface between the Hadoop cluster and the outside network.

2. They’re also often used as staging areas for data being transferred into the Hadoop cluster. As such, Oozie, Pig, Sqoop, and management tools such as Hue and Ambari run well there.

3. Yes, it's always better.

4. Yes, it doesn't store any hdfs data, it used for accessing cluster and processing/accessing the data.

5. Yes, it's always better to be outside of secured VLAN.

Link might help you more.

avatar
Expert Contributor

@Sbandaru:

i researched over this more deeply and conclusion is , we don't need edge node

  • We don’t need edge node if Hadoop cluster and application are in same network
  • its only needed when hadoop cluster and application are in diff network , at that time edge node acts as a gateway to hadoop cluster ( like a proxy )

thanks for your inputs

avatar
Super Guru

@ripunjay godhani

If your application has direct access to the Hadoop cluster, then that application server is your "edge" node. However, the fact that you don't need it in your special case, it does not mean is not a good practice because they are in the same network. That is not the explanation. @SBandaru explanation is valid and a best practice for those cases he mentioned.

avatar
Cloudera Employee

It is recommended to place data transfer utilities like Sqoop on anything but an edge node, as the high data transfer volumes could risk the ability of Hadoop services on the same node to communicate. It is also recommended to minimize the deployment of administrative tools on master and slave nodes to ensure that critical Hadoop services like the NameNode have as little competition for resources as possible.