We are trying to setup a new Hadoop cluster. We have around 4 Linux machines(1 Edge Node, 1 Name Node and 2 Data Nodes). Could someone explain what ports are required to be opened on which nodes before we start the installation? Or does Ambari take care of it? Is there something else that needs to be taken care of as well like SSH, TCP/IP port opening?
I have gone through the default ports mentioned in HDP reference guide. But not sure do they need to be opened on Edge Node or Name Node or Data Nodes or on all nodes?
Ambari requires that every host should have some basic ports opened for communication purpose.
Because certain ports must be open and available during installation, you should temporarily disable iptables. If the security protocols at your installation do not allow you to disable iptables, you can proceed with them on if all of the relevant ports are open and available; otherwise, cluster installation fails.
Following is the list of ports that are the default ports for various services and should be opened: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_reference/content/reference_chap2.html
Based on various service clients you should be able to unblock those ports as System Admin on your Worker Nodes (like NameNode, DataNode) and the Edge Nodes should be able to access those ports without issues.
Edge node refers to a dedicated node (machine) where no Hadoop services are running, and where we install only Hadoop clients (Like Hdfs, Hive, HBase etc. clients). So usually on the Edge Nodes we do not open any ports. But these edge nodes should have access to the Host/Ports of the indivudual Services like NameNode Ports/ DataNode Ports ...etc