Support Questions

Find answers, ask questions, and share your expertise

Cloudera - Kerberos GSS initiate failed

avatar
Explorer

Hello Guys,

 

I'm having some problems with Cloudera and Kerberos configuration. After enabling the Kerberos authentication in Cloudera's manager, i'm not able to issue the "hdfs" command.

The ticket was generated succesfully, but i'm receiving the error below:

 

Any help would be apreciated.

 

Thanks in advance!

 

[root@cpsmaaeip04 ~]# kinit -kt hdfs.keytab hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM
[root@cpsmaaeip04 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM

Valid starting Expires Service principal
03/15/2018 16:19:10 03/16/2018 16:19:10 krbtgt/HADOOP.EMETER.COM@HADOOP.EMETER.COM
renew until 03/20/2018 16:19:10

 

[root@cpsmaaeip04 ~]# hdfs dfs -ls /
18/03/15 16:20:04 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:07 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:07 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:11 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:11 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:13 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:13 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:14 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:14 WARN ipc.Client: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020
org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:762)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:285)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1637)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020; Host Details : local host is: "cpsmaaeip04.cpfl.com.br/10.50.152.51"; destination host is: "cpsmaaeip04.cpfl.com.br":8020;

1 ACCEPTED SOLUTION

avatar
Explorer

I am having my own kerberos problem, but I thought I'd share this in case it solves your problem. Cloudera stores its own kerberos keytab in the runtime directory. See if you can authenticate against that keytab. If not, then your runtime keytab is not correct and you may have to redistribute the keytab. (requires shutdown of the roles)

 

Here is the info you need:

 

1) One a data node, the runtime keytab is located in /run/cloudera-scm-agent/process/XXX-DATANODE/, for example:

 

# pwd

/run/cloudera-scm-agent/process

# ls -l */hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 14 23:25 166-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 20:28 197-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 21:33 203-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 16 18:07 207-hdfs-DATANODE/hdfs.keytab

 

2) Use kinit to authenticate against the keytab.

 

# kinit -t hdfs.keytab user/host@realm

 

If you can successfully authenticate against that keytab, then your keytab is good. I hope this helps. If not, you'll have to redistribute the keytabs.

 

Good luck.

 

 

View solution in original post

13 REPLIES 13

avatar
Master Guru

@vijithv,

 

Hard to say, but the timeout indicates that the client could not reach the KDC via UDP from that host.  Could be firewall, DNS, etc.

 

UDP has packet size restrictions that often don't permit Active Directory tickets to be issued.  Generally, the KDC will tell the client and the client will try TCP, but it seems on your one host that a connection to the KDC cannot even be made.  Firewall rules are certainly suspect but a number of things could cause this.

 

Using TCP always is fine.

avatar
Explorer

Thank you @bgooley. Will be able to clarify the below query?

 

When comparing to other host (like below). Does that mean it try to connect with UDP first and then switched to TCP? 

>>> KrbKdcReq send: kdc=****** UDP:88, timeout=3000, number of retries =3, #bytes=2247
>>> KDCCommunication: kdc=******  UDP:88, timeout=3000,Attempt =1, #bytes=2247
>>> KrbKdcReq send: #bytes read=104
>>> KrbKdcReq send: kdc=******  TCP:88, timeout=3000, number of retries =3, #bytes=2247
>>> KDCCommunication: kdc=******  TCP:88, timeout=3000,Attempt =1, #bytes=2247
>>>DEBUG: TCPClient reading 2722 bytes
>>> KrbKdcReq send: #bytes read=2722

 

If so, what could be the reason the affected host is not switching in similar manner. If that is with firewall, then how is it working when we add paramater udp_preference_limit to connect with TCP?

 

Thanks you

Vijith

avatar
Master Guru

@vijithv,

 

First, firewalls can easily block UDP and allow TCP.  I mentioned that was a possible cause.

Also, depending on how you have your /etc/krb5.conf configured, a different KDC could have been contacted.

 

You can see distinctly in the failure via UDP that there is a socket timeout for each attempt to connect to the KDC.  This is a failure at the networking side where a client cannot connect to a server.  Since no connection was ever made via UDP, there was no change for it to know to try TCP.  That "switching" is done based on a response of KRB5KRB_ERR_RESPONSE_TOO_BIG I believe so if no response is made, no "switching" to TCP will occur.

 

If you really want to get to the bottom of this, recreate the problem while capturing packets via tcpdump like this:

 

# tcpdump -i any -w ~/kerberos_broken.pcap port 88

 

Then, with the problem fixed reproduce again while capturing packets:

 

# tcpdump -i any -w ~/kerberos_fixed.pcap port 88

 

Use Wireshark (it does a great job of decoding Kerberos packets) and you will be able to see the entire interaction.

This will show us information to help determine the cause.

Wireshark is here:  https://www.wireshark.org/

 

 

avatar
Explorer

I am having my own kerberos problem, but I thought I'd share this in case it solves your problem. Cloudera stores its own kerberos keytab in the runtime directory. See if you can authenticate against that keytab. If not, then your runtime keytab is not correct and you may have to redistribute the keytab. (requires shutdown of the roles)

 

Here is the info you need:

 

1) One a data node, the runtime keytab is located in /run/cloudera-scm-agent/process/XXX-DATANODE/, for example:

 

# pwd

/run/cloudera-scm-agent/process

# ls -l */hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 14 23:25 166-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 20:28 197-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 21:33 203-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 16 18:07 207-hdfs-DATANODE/hdfs.keytab

 

2) Use kinit to authenticate against the keytab.

 

# kinit -t hdfs.keytab user/host@realm

 

If you can successfully authenticate against that keytab, then your keytab is good. I hope this helps. If not, you'll have to redistribute the keytabs.

 

Good luck.