Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Edge node setup

avatar
Contributor

I want to setup the Edge node for HDP 3.1 so i need your help. Please share the steps to build it.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@irfangk1 

 

It's NOT a requirement but best practice you that you have better control and filter of who has access to your cluster and it is on the edge, not you Firewall your cluster by deploying KNOX like a DMZ in a classic network.

2M and & 6D is fine so one of the 3 ZK masters will sit on a data node right? ..

Here is a document that should inspire you setup of edge node in HDP cluster

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@irfangk1 

Edge nodes are the interface between the Hadoop cluster and the outside network. They’re also often used as staging areas for data being transferred into the Hadoop cluster. Installing the edge node is as easy as adding a node to the cluster. The only difference is that on the edge-node you will only deploy client software ONLY e.g SQOOP, PIG, HDFS, YARN, HBase, SPARK, ZK HIVE or HUE etc to enable you to for example to run HDFS commands on the edge-node.


To enable communication between the outside network and the Hadoop cluster, edge nodes need to be multi-homed into the private subnet of the Hadoop cluster as well as into the corporate network.

A multi-homed computer is one that has dedicated connections to multiple networks. This is a practical illustration of why edge nodes are perfectly suited for interaction with the world outside the Hadoop cluster. Keeping your Hadoop cluster in its own private subnet is an excellent practice, so these edge nodes serve as a controlled window inside the cluster

If you're using Knox for perimeter security, then all clients' software should reside on a dedicated Knox gateway machine to which end users can submit their requests.It's good practice to divide the cluster into master nodes, worker nodes, edge node(s), and management node.
Services such as Namenode, Zookeeper, Yarn Resource Manager, Secondary Namenode usually run on the master node machines. Worker nodes aka Datanode should be further divided into two categories those running HDFS and Yarn and those running Storm and Kafka and other components


A minimum best practice is to have 3-5 master and >5 data nodes.

HTH

avatar
Contributor

Hi Shelton,

 

thanks for the reply, actually i am deploying 2M+6D  nodes cluster. is it required to have edge node for this setup

avatar
Master Mentor

@irfangk1 

 

It's NOT a requirement but best practice you that you have better control and filter of who has access to your cluster and it is on the edge, not you Firewall your cluster by deploying KNOX like a DMZ in a classic network.

2M and & 6D is fine so one of the 3 ZK masters will sit on a data node right? ..

Here is a document that should inspire you setup of edge node in HDP cluster