Note: Credit for the
key piece of information to solve this problem goes to Phil D’Amore.
A customer had a problem writing a Java application that
used the Hive client libraries to talk to two secure Hadoop clusters that
resided in different Kerberos realms. The same problem could be encountered by a client connecting to a single
secure Hadoop cluster that happened not to be in the Kerberos “default_realm”
as specified in the client host’s krb5.conf file. The same problem could occur for any Hadoop
ecosystem client, not just Hive clients.
In order to communicate with two different secure Hadoop
clusters, in different Kerberos realms, the client application did the
following things correctly:
It harvested the needed configuration files (in
this case, core-site.xml, hdfs-site.xml, and hive-site.xml) from each target
cluster, and used the appropriate configuration when communicating with each
respective cluster.
Its application user id had two Kerberos
principals, one registered and authenticated with each of the two KDCs, and
used the appropriate principal when authenticating to each respective cluster.
On the client host, it had a krb5.conf file that
correctly specified Kerberos kdc and admin_server values for each of the two target
realms in the [realms] section, and set one of the realms as the “default_realm”
in the [libdefaults] section. (It could
also have set a third realm as the default_realm, it would just mean that both
target clusters would be in non-default realms, which is also fine.)
However, when they ran the application, they had a puzzling
problem: They were able to authenticate
to the target cluster in the default realm, but failed with the target cluster
in the non-default realm. Indeed, after
the failure they found logs in the default_realm KDC that showed an incorrect
attempt to authenticate to the wrong
KDC.
They knew they had not made a coding error, because changing
the default_realm to the other target cluster caused the situation to
reverse. Depending on the setting of
default_realm in krb5.conf file, they could talk to either cluster, but not both
at once.
The problem was fixed by adding a [domain_realm] section to
the krb5.conf file. It turns out that the
Thrift libraries underlying the client have APIs that do not communicate the
target “realm”, but only the target server. The Kerberos libraries are responsible for translating from the target
server’s domain to the target realm. If
the domain and the realm have identical string values (except for upper/lower
case), which is common but not required, it will use that. Failing that, it will use the default
realm. It will not infer from the domain of the KDC servers. In this case the
domain and realm were different, so the authentication request for the
non-default realm was being sent to the default realm’s KDC. Adding a [domain_realm] section to the
krb5.conf file allows arbitrary mappings from target domains to target realms,
so Kerberos was finally able to translate from the desired target domain to the correct
target realm. See http://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#domain-realm
for details of the krb5.conf file sections and contents.