I got some issues while planning a secure Hadoop Cluster in a multihomed environment.
The situation is as follows:
- All host are multihomed, i.e. they have two separate IP addresses connected to two separate LANs. One is intended for interaction with the enterprise network and one for intra-cluster traffic. The intra-cluster network is not accessible from the outside.
- Each IP-address is associated with a separate hostname, e.g. host-internal.domain.com and host-enterprise.domain.com. The internal hostname is used as primary hostname and used in all Hadoop configurations.
- The DNS does resolve the hostnames consistent in all networks. There is no way to resolve the same hostname different depending on the network you are in.
Now I'm planning to set up Kerberos with an AD, that is placed in the enterprise network. I expect to see issues, as hadoop hosts will talk to the KDC via the external address but resolve itself to the internal hostname. I suspect that the Kerberos hostname validation will fail, as the KDC resolves the external IP to the external hostname.
Do you have any ideas, how to best handle this situation?
The only way I can currently imagine is, letting DNS resolve the primary hostname to the external address while using /etc/hosts to enforce correct resolution to the internal adress on all cluster hosts.
+1 to your idea. I would also suggest to keep internal hostnames in /etc/hosts file including your KDC as hadoop is only familiar with internal hostnames.
This is recommended in easiest option