Has anybody prepared a list of ephemeral ports for different applications of HDP and HDF already? I have been asked to tightly configure the firewall rules for segregating master nodes, edge nodes, streaming nodes and data nodes. I have been noticed that for example there are some ephemeral ports required for Zookeeper or Kafka or Spark Thrift Server, but I am looking for the full list of them.
The full list of ports is actually spread across multiple docs:
The only required NiFi ports needed are the UI ports (http and/or https) and any ports needed by processors for data. The two config properties are: nifi.web.http.port and nifi.web.https.port. For this reason, there isn't one place to go in our HDF docs, however, the link I provided above is another HCC thread that walks you through it pretty well. Ports for all other HDF components (Storm, Kafka, Ambari) are covered in the other two links I provided above.
Ephemeral ports are assigned by the OS and allocated automatically from a predefined range. For example, Linux uses the port range 32768 to 61000. HDFS does default to ephemeral ports for some HTTP/RPC endpoints. This can cause bind exceptions on service startup if the port is in use. For this reason, HDFS-9427 was created to update the HDFS default HTTP/RPC ports to non-ephemeral ports. This is resolved in Hadoop 3.0. Although other ephemeral ports are used by the services you mention, those ports are not exposed through config.
When configuring firewalls, you can not account for every random port that may be assigned by the OS for use. That is why firewall rules are often directional. For example, you wouldn’t make a firewall rule that said “Allow traffic from local port 53446 (randomly assigned) to remote port 50070”. Your firewall rule would be more like “allow a TCP connection originating locally destined for port 50070 on host XXXX”.
@Tom McCuch Thanks, but I was aware of these ports. Unfortunately, these ports are not complete and there are other ports which I didn't see them documented officially. Like the ephemeral ports I mentioned. For example, I've found out Kafka, Zookeeper and Spark Thrift Server come with lots of ephemeral ports which I don't know the purpose of them anyway. It is important to me to find all of these ports and their use cases.
The ephemeral ports that are assigned by the OS are randomly chosen from the list of unused ports above 1024. This pool can typically be further restricted by the OS, but this is not a common practice. Since these ports are randomly chosen by the OS for the client side of the communication, you can not craft effective firewall rules that attempt to limit communication to an ephemeral port. This is why firewall rules are typically directional.
For example, you would never craft a firewall rule that says "only local client port 53349 (randomly assigned) can talk bi-directionally to remote port 50070". Instead, you would write a firewall rule that says something like "allow TCP traffic originating locally destined for remote port 50070 on host myhost.example.com". That way, you are only allowing traffic that originates locally will be allowed.