From talking to Cloudera support people, I got an impression that they expect that all the Hadoop nodes, including data nodes, are on public network and the situation when Hadoop cluster is on private network and there are gateways for Hue, Cloudera Manager, ssh access, etc. is not supported. They immediately say that "multihomed" configuration is not officially supported. That sounds completely crazy. I would expect 99.9% of Hadoop clusters running on private networks with a few gateways. I think "multihomed" in the documentation means something else: like Hadoop nodes cannot communicate between themselves on different networks but it has nothing to do with gateways between Hadoop cluster and public network. Any comments?
That raises another question. To fully enable TLS inside Hadoop (up to level 3) requires certificates and Cloudera recommends against using self-signed certificates. But can one get an officially signed certificate for a node that is not on public network, not in DNS? Or level 3 is an overkill for Hadoop running on a private network and it is enough to do level 1? Is level 1 sufficient to configure Kerberos? Documentation says that configuring TLS is a prerequisite for Kerberos.
But would not you want Hadoop nodes to communicate on a faster more expensive network than your enterprise network?
I my case, nodes communicate on 10G network but the enterprise network is 1G.
What if I pretend that there is only 10G internal network and use something external to Hadoop, like firewall, to forward traffic from CM machine external interface on port 7180 to the internal interface so that I do not have to deal inside Hadoop with multiple networks? Might that work?
If I do have to generate a single certificate for two interfaces using keytool, what's the exact syntax?
Is it something like:
keytool -genkeypair -keystore myhost.keystore -keyalg RSA -alias myhost -dname "CN=myhost.mydomain O=Hadoop" -storepass <pass> -keypass <pass> -ext SAN=myhost2.mydomain2
When I tried to use the above command I got:
keytool error: java.lang.Exception: Key pair not generated, alias <myhost> already exists
the same as
One can either use alias or -ext SAN but not both?
Should it be given in plain text in the header of pem file before "BEGIN CERTIFICATE" or is it actually encoded in the certificate gibberish?
I do not see it in the header. There is original hostname and alias but no alternative host name.