Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Super Collaborator

Running a hadoop client on Mac OS X and connect to a Kerberized cluster poses some extra challenges.

I suggest to use brew, the Mac package manager to conveniently install the Hadoop package:

$ brew search hadoop
$ brew install hadoop

This will install the latest (apache) Hadoop distro, (2.7.3 at the time of writing). Minor version differences to your HDP version will not matter.

You may test the installation by running a quick 'hdfs dfs -ls / ' on HDFS. Without further configuration a local single node 'cluster' will be assumed.

We now have to point the client to the real HDP cluster. In order to do so you need to copy the full contents of the config files below from any HDP node:

Source:

/etc/hadoop/{hdp-version}/0/hadoop-env.sh
/etc/hadoop/{hdp-version}/0/core-site.xml
/etc/hadoop/{hdp-version}/0/hdfs-site.xml 
/etc/hadoop/{hdp-version}/0/yarn-site.xml

Target:

/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/core-site.xml
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hdfs-site.xml
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/yarn-site.xml

If we now try to access the Kerberized cluster we get an error like below:

Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:737)
	at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
	at org.apache.hadoop.ipc.Client.call(Client.java:1451)
	... 28 more

Sure, we need to kinit first so we do:

$ kinit test@A.EXMAPLE.COM
test@A.EXMAPLE.COM's password:
$ hdfs dfs -ls /

We still get the same error, so what is going on?

It makes sense to add this extra option (-Dsun.security.krb5.debug=true) to hadoop-env.sh now, to enable Kerberos debug log output :

export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}"

Now the debug output provides some clues:

$ hdfs dfs -ls /
Java config name: null
Native config name: /Library/Preferences/edu.mit.Kerberos
Loaded from native config
16/12/29 17:02:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>>KinitOptions cache name is /tmp/krb5cc_502
>> Acquire default native Credentials
default etypes for default_tkt_enctypes: 23 16.
>>> Found no TGT's in LSA

By default the HDFS clients looks for Kerberos tickets at /tmp/krb5cc_502 where '502' is the variable uid of the relevant user. The other thing to look at is 'Native config name: /Library/Preferences/edu.mit.Kerberos' , this is where your local Kerberos configs are sourced from. Another valid config source would be '/etc/krb5.conf ' depending on your local installation. You can source and mirror this local config from any HDP nodes from the /etc/krb5.conf file.

Now if we look at the default ticket cache on a Mac OS X it seems to point to another location:

$ klist
Credentials cache: API:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX
        Principal: test@A.EXMAPLE.COM
  Issued                Expires               Principal
Dec 29 17:02:45 2016  Dec 30 03:02:45 2016  krbtgt/A.EXMAPLE.COM@A.EXMAPLE.COM

The pointer 'API:XXXXXX-XXXXX-XXXX-XXXXX' signals Mac OS X' memory-based credential cache for Kerberos. On a nix distro it would typically say something like 'Ticket cache: FILE:/tmp/krb5cc_502'. The location to store the ticket cache can be set by the environment variable KRB5CCNAME (FILE: / DIR: / API: / KCM: / MEMORY:) but that is beyond the scope of this article. This is why the HDFS client could not find any ticket.

If the HDFS client looks for the ticket cache at '/tmp/krbcc_502' we can simply make Mac OS X cache a validated Kerberos ticket there like this:

$ kinit -c FILE:/tmp/krb5cc_502 test@A.EXMAPLE.COM
test@A.EXMAPLE.COM's password:

Or likewise with a keytab:

$ kinit -c FILE:/tmp/krb5cc_502 -kt ~/Downloads/smokeuser.headless.keytab ambari-qa-socgen_shadow@MIT.KDC.COM 

Check the ticket cache the same way:

$ klist -c /tmp/krb5cc_502
Credentials cache: FILE:/tmp/krb5cc_502
        Principal: test@A.EXMAPLE.COM

  Issued                Expires               Principal
Dec 29 17:31:29 2016  Dec 30 03:31:29 2016  krbtgt/A.EXMAPLE.COM@A.EXMAPLE.COM

If you try to list hdfs again now it should look something like this:

$ hdfs dfs -ls /user
Java config name: null
Native config name: /Library/Preferences/edu.mit.Kerberos
Loaded from native config
16/12/29 17:34:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>>KinitOptions cache name is /tmp/krb5cc_502
>>>DEBUG <CCacheInputStream>  client principal is test@A.EXMAPLE.COM
>>>DEBUG <CCacheInputStream> server principal is krbtgt/A.EXMAPLE.COM@A.EXMAPLE.COM
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Thu Dec 29 17:31:29 CET 2016
>>>DEBUG <CCacheInputStream> start time: Thu Dec 29 17:31:29 CET 2016
>>>DEBUG <CCacheInputStream> end time: Fri Dec 30 03:31:29 CET 2016
>>>DEBUG <CCacheInputStream> renew_till time: Thu Jan 05 17:31:27 CET 2017
>>> CCacheInputStream: readFlags()  FORWARDABLE; RENEWABLE; INITIAL; PRE_AUTH;
>>>DEBUG <CCacheInputStream>  client principal is test@A.EXMAPLE.COM
>>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/A.EXAMPLE.COM@A.EXAMPLE.COM@MIT.KDC.COM
>>>DEBUG <CCacheInputStream> key type: 0
>>>DEBUG <CCacheInputStream> auth time: Thu Dec 29 17:31:21 CET 2016
>>>DEBUG <CCacheInputStream> start time: null
>>>DEBUG <CCacheInputStream> end time: Thu Dec 29 17:31:21 CET 2016
>>>DEBUG <CCacheInputStream> renew_till time: null
>>> CCacheInputStream: readFlags()
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
      client=test@A.EXMAPLE.COM
      server=krbtgt/A.EXMAPLE.COM@A.EXMAPLE.COM
    authTime=20161229163129Z
   startTime=20161229163129Z
     endTime=20161230023129Z
   renewTill=20170105163127Z
       flags=FORWARDABLE;RENEWABLE;INITIAL;PRE-AUTHENT
EType (skey)=18
   (tkt key)=18
16/12/29 17:34:30 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Found ticket for test@A.EXAMPLE.COM to go to krbtgt/A.EXAMPLE.COM@A.EXAMPLE.COM expiring on Fri Dec 30 03:31:29 CET 2016
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for test@A.EXAMPLE.COM to go to krbtgt/A.EXAMPLE.COM@A.EXAMPLE.COM expiring on Fri Dec 30 03:31:29 CET 2016
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: main loop: [0] tempService=krbtgt/MIT.KDC.COM@A.EXAMPLE.COM
default etypes for default_tgs_enctypes: 23 16.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KdcAccessibility: reset
......
....S H O R T E N E D..
......
Found 4 items
drwxrwx---   - ambari-qa hdfs            0 2016-12-19 21:56 /user/ambari-qa
drwxr-xr-x   - centos    centos          0 2016-11-30 12:07 /user/centos
drwx------   - hdfs      hdfs            0 2016-11-29 12:38 /user/hdfs
drwxrwxrwx   - j.knulst  hdfs            0 2016-12-29 13:40 /user/j.knulst

So directing your Kerberos tickets on Mac OS X to the anticipated ticket cache with the ' -c ' switch will help a lot.

8,942 Views