Created on 10-16-2019 09:22 PM - last edited on 10-16-2019 10:50 PM by ask_bill_brooks
I want to setup the Edge node for HDP 3.1 so i need your help. Please share the steps to build it.
Created on 10-17-2019 09:08 PM - edited 10-17-2019 09:09 PM
It's NOT a requirement but best practice you that you have better control and filter of who has access to your cluster and it is on the edge, not you Firewall your cluster by deploying KNOX like a DMZ in a classic network.
2M and & 6D is fine so one of the 3 ZK masters will sit on a data node right? ..
Here is a document that should inspire you setup of edge node in HDP cluster
Created 10-17-2019 12:42 PM
Edge nodes are the interface between the Hadoop cluster and the outside network. They’re also often used as staging areas for data being transferred into the Hadoop cluster. Installing the edge node is as easy as adding a node to the cluster. The only difference is that on the edge-node you will only deploy client software ONLY e.g SQOOP, PIG, HDFS, YARN, HBase, SPARK, ZK HIVE or HUE etc to enable you to for example to run HDFS commands on the edge-node.
To enable communication between the outside network and the Hadoop cluster, edge nodes need to be multi-homed into the private subnet of the Hadoop cluster as well as into the corporate network.
A multi-homed computer is one that has dedicated connections to multiple networks. This is a practical illustration of why edge nodes are perfectly suited for interaction with the world outside the Hadoop cluster. Keeping your Hadoop cluster in its own private subnet is an excellent practice, so these edge nodes serve as a controlled window inside the cluster
If you're using Knox for perimeter security, then all clients' software should reside on a dedicated Knox gateway machine to which end users can submit their requests.It's good practice to divide the cluster into master nodes, worker nodes, edge node(s), and management node.
Services such as Namenode, Zookeeper, Yarn Resource Manager, Secondary Namenode usually run on the master node machines. Worker nodes aka Datanode should be further divided into two categories those running HDFS and Yarn and those running Storm and Kafka and other components
A minimum best practice is to have 3-5 master and >5 data nodes.
HTH
Created 10-17-2019 08:48 PM
Hi Shelton,
thanks for the reply, actually i am deploying 2M+6D nodes cluster. is it required to have edge node for this setup
Created on 10-17-2019 09:08 PM - edited 10-17-2019 09:09 PM
It's NOT a requirement but best practice you that you have better control and filter of who has access to your cluster and it is on the edge, not you Firewall your cluster by deploying KNOX like a DMZ in a classic network.
2M and & 6D is fine so one of the 3 ZK masters will sit on a data node right? ..
Here is a document that should inspire you setup of edge node in HDP cluster