Running a hadoop client on Mac OS X and connect to a Kerberized cluster poses some extra challenges.
I suggest to use brew, the Mac package manager to conveniently install the Hadoop package:
$ brew search hadoop $ brew install hadoop
This will install the latest (apache) Hadoop distro, (2.7.3 at the time of writing). Minor version differences to your HDP version will not matter.
You may test the installation by running a quick 'hdfs dfs -ls / ' on HDFS. Without further configuration a local single node 'cluster' will be assumed.
We now have to point the client to the real HDP cluster. In order to do so you need to copy the full contents of the config files below from any HDP node:
Source:
/etc/hadoop/{hdp-version}/0/hadoop-env.sh
/etc/hadoop/{hdp-version}/0/core-site.xml
/etc/hadoop/{hdp-version}/0/hdfs-site.xml
/etc/hadoop/{hdp-version}/0/yarn-site.xml
Target:
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/core-site.xml /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hdfs-site.xml /usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/yarn-site.xml
If we now try to access the Kerberized cluster we get an error like below:
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:737) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 28 more
Sure, we need to kinit first so we do:
$ kinit [email protected] [email protected]'s password: $ hdfs dfs -ls /
We still get the same error, so what is going on?
It makes sense to add this extra option (-Dsun.security.krb5.debug=true) to hadoop-env.sh now, to enable Kerberos debug log output :
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}"
Now the debug output provides some clues:
$ hdfs dfs -ls / Java config name: null Native config name: /Library/Preferences/edu.mit.Kerberos Loaded from native config 16/12/29 17:02:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable >>>KinitOptions cache name is /tmp/krb5cc_502 >> Acquire default native Credentials default etypes for default_tkt_enctypes: 23 16. >>> Found no TGT's in LSA
By default the HDFS clients looks for Kerberos tickets at /tmp/krb5cc_502 where '502' is the variable uid of the relevant user. The other thing to look at is 'Native config name: /Library/Preferences/edu.mit.Kerberos' , this is where your local Kerberos configs are sourced from. Another valid config source would be '/etc/krb5.conf ' depending on your local installation. You can source and mirror this local config from any HDP nodes from the /etc/krb5.conf file.
Now if we look at the default ticket cache on a Mac OS X it seems to point to another location:
$ klist
Credentials cache: API:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX
Principal: [email protected]
Issued Expires Principal
Dec 29 17:02:45 2016 Dec 30 03:02:45 2016 krbtgt/[email protected]
The pointer 'API:XXXXXX-XXXXX-XXXX-XXXXX' signals Mac OS X' memory-based credential cache for Kerberos. On a nix distro it would typically say something like 'Ticket cache: FILE:/tmp/krb5cc_502'. The location to store the ticket cache can be set by the environment variable KRB5CCNAME (FILE: / DIR: / API: / KCM: / MEMORY:) but that is beyond the scope of this article. This is why the HDFS client could not find any ticket.
If the HDFS client looks for the ticket cache at '/tmp/krbcc_502' we can simply make Mac OS X cache a validated Kerberos ticket there like this:
$ kinit -c FILE:/tmp/krb5cc_502 [email protected] [email protected]'s password:
Or likewise with a keytab:
$ kinit -c FILE:/tmp/krb5cc_502 -kt ~/Downloads/smokeuser.headless.keytab [email protected]
Check the ticket cache the same way:
$ klist -c /tmp/krb5cc_502
Credentials cache: FILE:/tmp/krb5cc_502
Principal: [email protected]
Issued Expires Principal
Dec 29 17:31:29 2016 Dec 30 03:31:29 2016 krbtgt/[email protected]
If you try to list hdfs again now it should look something like this:
$ hdfs dfs -ls /user Java config name: null Native config name: /Library/Preferences/edu.mit.Kerberos Loaded from native config 16/12/29 17:34:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable >>>KinitOptions cache name is /tmp/krb5cc_502 >>>DEBUG <CCacheInputStream> client principal is [email protected] >>>DEBUG <CCacheInputStream> server principal is krbtgt/[email protected] >>>DEBUG <CCacheInputStream> key type: 18 >>>DEBUG <CCacheInputStream> auth time: Thu Dec 29 17:31:29 CET 2016 >>>DEBUG <CCacheInputStream> start time: Thu Dec 29 17:31:29 CET 2016 >>>DEBUG <CCacheInputStream> end time: Fri Dec 30 03:31:29 CET 2016 >>>DEBUG <CCacheInputStream> renew_till time: Thu Jan 05 17:31:27 CET 2017 >>> CCacheInputStream: readFlags() FORWARDABLE; RENEWABLE; INITIAL; PRE_AUTH; >>>DEBUG <CCacheInputStream> client principal is [email protected] >>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/[email protected]@MIT.KDC.COM >>>DEBUG <CCacheInputStream> key type: 0 >>>DEBUG <CCacheInputStream> auth time: Thu Dec 29 17:31:21 CET 2016 >>>DEBUG <CCacheInputStream> start time: null >>>DEBUG <CCacheInputStream> end time: Thu Dec 29 17:31:21 CET 2016 >>>DEBUG <CCacheInputStream> renew_till time: null >>> CCacheInputStream: readFlags() >>> KrbCreds found the default ticket granting ticket in credential cache. >>> Obtained TGT from LSA: Credentials: [email protected] server=krbtgt/[email protected] authTime=20161229163129Z startTime=20161229163129Z endTime=20161230023129Z renewTill=20170105163127Z flags=FORWARDABLE;RENEWABLE;INITIAL;PRE-AUTHENT EType (skey)=18 (tkt key)=18 16/12/29 17:34:30 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Found ticket for [email protected] to go to krbtgt/[email protected] expiring on Fri Dec 30 03:31:29 CET 2016 Entered Krb5Context.initSecContext with state=STATE_NEW Found ticket for [email protected] to go to krbtgt/[email protected] expiring on Fri Dec 30 03:31:29 CET 2016 Service ticket not found in the subject >>> Credentials acquireServiceCreds: main loop: [0] tempService=krbtgt/[email protected] default etypes for default_tgs_enctypes: 23 16. >>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KdcAccessibility: reset ...... ....S H O R T E N E D.. ...... Found 4 items drwxrwx--- - ambari-qa hdfs 0 2016-12-19 21:56 /user/ambari-qa drwxr-xr-x - centos centos 0 2016-11-30 12:07 /user/centos drwx------ - hdfs hdfs 0 2016-11-29 12:38 /user/hdfs drwxrwxrwx - j.knulst hdfs 0 2016-12-29 13:40 /user/j.knulst
So directing your Kerberos tickets on Mac OS X to the anticipated ticket cache with the ' -c ' switch will help a lot.