This is a standard DMZ network architecture where a subset of hosts (knox gateway, edge node) form a communication layer between the external network and the rests of the hosts in the internal network. Hosts in the DMZ can be seen as being both in the internal and external network. Their purpose is to isolate the rest of the hosts (the hadoop clusters) from any direct communication with the external network.
In the above example, the first firewall forces all internet communication to talk only to the knox gateway. Communication that passes security challenges at the gateway (IP, ports, Kerberos/LDAP authentication, other) are routed to the cluster.
Theoretically the first firewall should be sufficient to secure the cluster. This firewall however is exposed to the entire global internet and all of the hackers and evolving hacking techniques out there. As such, there is still risk of attacks from the internet directly into the cluster and its data, mission critical operations, etc.
The second firewall further isolates the cluster by forcing the cluster to only accept communication from the gateway, which is a known host on the internal network.
The overall result is that any malicious attacks are isolated to the DMZ hosts and cannot penetrate into the cluster. Compromizes are isolated to the DMZ.
The DMZ concept is based on Demilitarized Zones in the military when a zone is built to hold buildings etc that are used by parties inside and outside the military, but only the military in the DMZ could communicate with the militarized zone (the internal network).
For details on HDP Knox Gateway security settings: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.ht...