Created 07-16-2016 03:34 AM
Hi Team, We are going to deploy HDP 2.3.4 for Big Env setup Can Some one
Please explain me the architecture of Edge node in hadoop .
I am able to find only the definition on the internet. I have some queries
1)What is edge node?
2) when and why do we need it ?
3) does every production cluster contain this edge node?
4) Does the edge node a part of the cluster (What advantages do we have if it is inside the cluster . Does it store any blocks of data in hdfs. any performance improvement?
5)Should the edge node be outside the cluster .
6) Please refer any docs where i can know about it. Preferably Hortonworks docs
Created 07-16-2016 04:01 AM
1. Edge nodes are the interface between the Hadoop cluster and the outside network.
2. They’re also often used as staging areas for data being transferred into the Hadoop cluster. As such, Oozie, Pig, Sqoop, and management tools such as Hue and Ambari run well there.
3. Yes, it's always better.
4. Yes, it doesn't store any hdfs data, it used for accessing cluster and processing/accessing the data.
5. Yes, it's always better to be outside of secured VLAN.
Link might help you more.
Created 07-16-2016 04:01 AM
1. Edge nodes are the interface between the Hadoop cluster and the outside network.
2. They’re also often used as staging areas for data being transferred into the Hadoop cluster. As such, Oozie, Pig, Sqoop, and management tools such as Hue and Ambari run well there.
3. Yes, it's always better.
4. Yes, it doesn't store any hdfs data, it used for accessing cluster and processing/accessing the data.
5. Yes, it's always better to be outside of secured VLAN.
Link might help you more.
Created 07-19-2016 05:55 AM
@Sbandaru:
i researched over this more deeply and conclusion is , we don't need edge node
thanks for your inputs
Created 03-15-2017 01:59 AM
If your application has direct access to the Hadoop cluster, then that application server is your "edge" node. However, the fact that you don't need it in your special case, it does not mean is not a good practice because they are in the same network. That is not the explanation. @SBandaru explanation is valid and a best practice for those cases he mentioned.
Created 03-14-2017 05:37 PM
It is recommended to place data transfer utilities like Sqoop on anything but an edge node, as the high data transfer volumes could risk the ability of Hadoop services on the same node to communicate. It is also recommended to minimize the deployment of administrative tools on master and slave nodes to ensure that critical Hadoop services like the NameNode have as little competition for resources as possible.