Support Questions

cjervis · ‎02-13-2019

Hi,

I got an error during an attempt to access a remote cluster secured by Kerberos and I dont know why the client is trying to find out the hdfs principal in the local KDC.

The setup is as follows (intentionally ommit full domain names and host names to keep it tidy):

- each cluster (CLUSTERDEV and CLUSTERPROD) has its own KDC (DEVREALM and PRODREALM)

- the KDC trusts each other (verified by kvno hdfs/<namenodehost>@REMOTE_REALM from boths sides)

- both clusters are running NameNode in HA mode

I have configured the Trusted realm in ClouderaManager for CLUSTERDEV set to CLUSTERPROD. (This triggered the RULE change in auth_to_local for core-site.xml). I have done the same for CLUSTERPROD and set trusted CLUSTERDEV.

- each krb5.conf in CLUSTERDEV has also a PRODREALM in [realms] (I can kinit with "remote" account)

- each krb5.conf has [capaths] DEVREALM = { PRODREALM = . }

- and vice versa, in CLUSTERPROD each krb5.conf the DEVRELAM is added to [realms] plus [capath] PRODREALM = { DEVREALM = . }

Now on the DEV cluster I want to access the PROD cluster:

I have prepared a custom hdfs-site, where I have added the PROD clusters' namenode info into a custom hdfs-site.xml stored in distcpconf (a copy of the actual hadoop-conf on a gateway host)

https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cdh_admin_distcp_data_cluster_migrat...

Tried to test the new configuration (on a gateway host in DEV cluster):

HADOOP_CONF=/home/centos/distcpconf hdfs dfs -ls hdfs://prodnameservice/tmp
export HADOOP_CONF=/home/centos/distcpconf 
hdfs dfs -ls hdfs://prodnameservice/tmp

None of the above worked, the client does not know the prodnameservice.

-ls: java.net.UnknownHostException: prodnameservice

First question: why the client is not taking into the account the modified env variable?

I had to put this custom hdfs-site.xml into /etc/hadoop/conf/ and then it suddenly know what is "prodnameservice".

The hdfs ls returns ( I have logged in as tomas2@PRODREALM on DEV gateway):

PriviledgedActionException as:user@PRODREALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)]

and in the same time DEV KDC reports:

TGS_REQ (2 etypes {18 17}) 10.85.150.42: LOOKING_UP_SERVER: authtime 0,  user@PRODREALM for hdfs/prod.namenode.fqn@DEVREALM, Server not found in Kerberos database

and in the same time PROD KDC reports:

TGS_REQ (2 etypes {18 17}) 10.85.150.42: ISSUE: authtime 1550044165, etypes {rep=18 tkt=18 ses=18}, user@PRODREALM for krbtgt/DEVREALM@PRODREALM

So I dont understand why the client is trying to look for a hdfs/PRODUCTION_NAMENODE principal in DEV KDC. As you can see the PROD KDC correctly reports the ticket granting service for cross realm trust using krbtgt/DEV@PROD.

So I went back to the modified hdfs-site.xml and changed everything from DEV to PROD in these items, so it points now to the PROD:

   <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/_HOST@PRODREALM</value>
  </property>
  <property>
    <name>dfs.namenode.kerberos.internal.spnego.principal</name>
    <value>HTTP/_HOST@PRODREALM</value>
  </property>
  <property>
    <name>dfs.datanode.kerberos.principal</name>
    <value>hdfs/_HOST@PRODREALM</value>
  </property>

Run again the ls with the same error results.

Then I reverted this change in hdfs-site.xml and changed the krb5.conf default_realm on the DEV gateway where I try to do the "ls".

After this I was able to do "ls" on the remote cluster, BUT I want to access the remote cluster without changing the default realm in the DEV krb5.conf gateway file.

[centos@ip-10-85-150-42 ~]$ hdfs dfs -ls hdfs://prodnameservice/tmp
Found 5 items
...
[centos@ip-10-85-150-42 ~]$ kinit tomas2@DEVREALM
Password for user@DEVREALM:
[centos@ip-10-85-150-42 ~]$ hdfs dfs -ls hdfs://prodnameservice/tmp
19/02/13 09:44:28 INFO util.KerberosName: No auth_to_local rules applied to user@DEVREALM
Found 5 items
...

as the client reports in the second case, the auth_to_local is not applied. But as I said before,the "Trusted Kerberos Realms"

<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](.*@\QDEVLREAM\E$)s/@\QDEVREALM\E$//
RULE:[2:$1@$0](.*@\QDEVREALM\E$)s/@\QDEVREALM\E$//
DEFAULT
</value>

(and the same rules are in DEV cluster but just with the opposite REALM).

Why it is not using the RULEs from core-site.xml?

And the most important question, why the hdfs client is trying to find the prod namenode in the DEV KDC? How can I do "ls" form DEV gateway without chaning the default realm?

Thanks for any advise,

T.

Tomas79 · ‎02-18-2019

After many searches I think I have found the solution. None of the blogs on Cloudera or Hortonworks states, the solution, because I think in all cases the hosts running the clusters are using custom DNS. Thus the krb5.conf nicely maps with the cluster's REALM, or if not then a simple line of conf makes sure the mapping.

In my case all the host names are managed by AWS DNS, thus no custom domain names used. This was the reason why my client tried to look up for the namenode in the local KDC, because it used the default_realm to get the service ticket. But after adding into krb5.conf in DEV node:

[domain_realm]
ip-xx-xx-xx-xx.eu-west-1.compute.internal = PRODREALM
ip-xx-xx-xx-xx.eu-west-1.compute.internal = PRODREALM

i.e.:
<fully_qualified_host_name_of_remote_namenode1> = <REMOTE REALM>
<fully_qualified_host_name_of_remote_namenode2> = <REMOTE REALM>

I was able to ls the remote HDFS. Now to the High availability part, I had to add adintional nameservice info into hdfs-site.xml:

dfs.ha.namenodes.hanameservice <- ADD here the remote nameservice
dfs.namenode.rpc-address.* <- Add the remote nameservice FQDNs 
dfs.namenode.https-address.* <- Add the remote nameservice FQDNs
dfs.namenode.http-address.* <- Add the remote nameservice FQDNs 
dfs.namenode.servicerpc-address.* <- Add the remote nameservice FQDNs

The I was able to use HA nameservice name to access the HDFS.

Also during distcp I had to use the:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=name_of_the_prod_nameservice

when launching a distcp from dev and copying data from prod to dev.

And the answer to the last question, regarding HADOOP_CONF - I am not sure here, but I think hdfs scripts in cloudera bin are overriding this env variable, so regardless what you set to HADOOP_CONF, it will not be applied. So when Cloudera's guide states:

export HADOOP_CONF_DIR=path_to_working_directory

you have to be sure, that the script does not override this setting.

View solution in original post

cjervis · ‎02-13-2019

Tested the corss realm auth based on suggestion from HarshJ:

https://community.cloudera.com/t5/Cloudera-Manager-Installation/Test-cross-realm-kerberos/m-p/32422#...

kinit user@REMOTEREALM
kvno hdfs/namenode-host@LOCALREALM

Running on CLUSTERDEV:

kinit user@PRODREALM
kvno hdfs/dev_name_node_host@DEVRELAM

klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: user@PRODREALM

Valid starting       Expires              Service principal
02/13/2019 13:59:41  02/14/2019 13:59:41  krbtgt/PRODREALM@PRODREALM
        renew until 02/20/2019 13:59:41
02/13/2019 14:00:05  02/14/2019 13:59:41  krbtgt/DEVREALM@PRODREALM
        renew until 02/20/2019 13:59:41
02/13/2019 14:00:55  02/14/2019 13:59:41  hdfs/<dev namenode fqdn>@DEVREALM
        renew until 02/18/2019 14:00:55

-> OK.

Tomas79 · ‎02-18-2019

After many searches I think I have found the solution. None of the blogs on Cloudera or Hortonworks states, the solution, because I think in all cases the hosts running the clusters are using custom DNS. Thus the krb5.conf nicely maps with the cluster's REALM, or if not then a simple line of conf makes sure the mapping.

In my case all the host names are managed by AWS DNS, thus no custom domain names used. This was the reason why my client tried to look up for the namenode in the local KDC, because it used the default_realm to get the service ticket. But after adding into krb5.conf in DEV node:

[domain_realm]
ip-xx-xx-xx-xx.eu-west-1.compute.internal = PRODREALM
ip-xx-xx-xx-xx.eu-west-1.compute.internal = PRODREALM

i.e.:
<fully_qualified_host_name_of_remote_namenode1> = <REMOTE REALM>
<fully_qualified_host_name_of_remote_namenode2> = <REMOTE REALM>

I was able to ls the remote HDFS. Now to the High availability part, I had to add adintional nameservice info into hdfs-site.xml:

dfs.ha.namenodes.hanameservice <- ADD here the remote nameservice
dfs.namenode.rpc-address.* <- Add the remote nameservice FQDNs 
dfs.namenode.https-address.* <- Add the remote nameservice FQDNs
dfs.namenode.http-address.* <- Add the remote nameservice FQDNs 
dfs.namenode.servicerpc-address.* <- Add the remote nameservice FQDNs

The I was able to use HA nameservice name to access the HDFS.

Also during distcp I had to use the:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=name_of_the_prod_nameservice

when launching a distcp from dev and copying data from prod to dev.

And the answer to the last question, regarding HADOOP_CONF - I am not sure here, but I think hdfs scripts in cloudera bin are overriding this env variable, so regardless what you set to HADOOP_CONF, it will not be applied. So when Cloudera's guide states:

export HADOOP_CONF_DIR=path_to_working_directory

you have to be sure, that the script does not override this setting.

cjervis · ‎02-18-2019

Congratulations on solving your issue and thank you for sharing it for others who may run into something similar. 🙂

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Unable to access NameNode in cross realm trust between clusters