Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera - Kerberos GSS initiate failed

avatar
Explorer

Hello Guys,

 

I'm having some problems with Cloudera and Kerberos configuration. After enabling the Kerberos authentication in Cloudera's manager, i'm not able to issue the "hdfs" command.

The ticket was generated succesfully, but i'm receiving the error below:

 

Any help would be apreciated.

 

Thanks in advance!

 

[root@cpsmaaeip04 ~]# kinit -kt hdfs.keytab hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM
[root@cpsmaaeip04 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM

Valid starting Expires Service principal
03/15/2018 16:19:10 03/16/2018 16:19:10 krbtgt/HADOOP.EMETER.COM@HADOOP.EMETER.COM
renew until 03/20/2018 16:19:10

 

[root@cpsmaaeip04 ~]# hdfs dfs -ls /
18/03/15 16:20:04 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:07 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:07 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:11 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:11 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:13 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:13 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:14 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds seconds before. Last Login=1521141604562
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
18/03/15 16:20:14 WARN ipc.Client: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020
org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
at org.apache.hadoop.ipc.Client.call(Client.java:1447)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:762)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:285)
at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1637)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
18/03/15 16:20:14 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020
ls: Failed on local exception: java.io.IOException: Couldn't setup connection for hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM to cpsmaaeip04.cpfl.com.br/10.50.152.51:8020; Host Details : local host is: "cpsmaaeip04.cpfl.com.br/10.50.152.51"; destination host is: "cpsmaaeip04.cpfl.com.br":8020;

1 ACCEPTED SOLUTION

avatar
Explorer

I am having my own kerberos problem, but I thought I'd share this in case it solves your problem. Cloudera stores its own kerberos keytab in the runtime directory. See if you can authenticate against that keytab. If not, then your runtime keytab is not correct and you may have to redistribute the keytab. (requires shutdown of the roles)

 

Here is the info you need:

 

1) One a data node, the runtime keytab is located in /run/cloudera-scm-agent/process/XXX-DATANODE/, for example:

 

# pwd

/run/cloudera-scm-agent/process

# ls -l */hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 14 23:25 166-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 20:28 197-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 15 21:33 203-hdfs-DATANODE/hdfs.keytab

-rw------- 1 hdfs hdfs 1570 Mar 16 18:07 207-hdfs-DATANODE/hdfs.keytab

 

2) Use kinit to authenticate against the keytab.

 

# kinit -t hdfs.keytab user/host@realm

 

If you can successfully authenticate against that keytab, then your keytab is good. I hope this helps. If not, you'll have to redistribute the keytabs.

 

Good luck.

 

 

View solution in original post

13 REPLIES 13

avatar
Champion

Hi

 

Please check your host file - under /etc/hosts whether it includes FQDN or not ? 

if not please add , this should fix the kerberos issue  

 

 

Thanks

 

avatar
Explorer

Hello!

 

Thanks for replying!

 

Yes, the /etc/hosts includes the FQDN. I checked the namenode logs and i noticed that the error below is occuring when i try to issue the HDFS command.

 

2018-03-16 16:01:11,323 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44))
2018-03-16 16:01:11,331 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44))

 

I have checked the KVNO number and the numbers match, with both at 16, as below:

 

[eip@cpsmaaeip04 .keytabs]$ klist -k hdfs.keytab
Keytab name: FILE:hdfs.keytab
KVNO Principal
---- --------------------------------------------------------------------------
16 hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM

 


[eip@cpsmaaeip04 .keytabs]$ kvno hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM
hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM: kvno = 16

 

Do you guys have any clue about this issue?

 

Thanks in advance!

avatar
Explorer

Please check if everything in KDC has been configured and all the processes of KDC master/slaves daemons are running.

avatar
Master Guru

@Gabre,

 

This error indicates that the server could not find a key to decrypt the Authentication request.

This can happen if the client requests a Service Ticket with a particular encryption type that the KDC has but the HDFS NameNode's keytab does not have that same encryption type.

 

Some things to check:

 

- /etc/krb5.conf

What encryption types do you have configured in libdefaults?

 

- run this on your active NameNode host:

 

# klist -kte /var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep NAMENODE| tail -1`/hdfs.keytab

 

Note the encryption types.

 

The encryption types in the klist output are the only ones that can be used to decrypt.

 

To verify what encryption type is being requested and returned in the service ticket reply, you can add some debugging to your hdfs command like this:

# HADOOP_ROOT_LOGGER=TRACE,console HADOOP_JAAS_DEBUG=true HADOOP_OPTS="-Dsun.security.krb5.debug=true" hdfs dfs -ls /

 

 

avatar
Explorer

Hello @bgooley and @ramin!

 

Thanks for the help that you guys provided...

 

I solved the problem.

 

The problem is that the keytabs were being generated by cloudera in execution time... and i was trying to export the keytab of hdfs using xst -k hdfs.keytab hdfs/FQDN@HADOOP.EMETER.COM and it was changing the principal password! So when i tried to issue the "hdfs dfs -ls /" command, it tried to authenticate using a different password.

A workaround that i did is to copy the keytab that i need from /var/run/cloudera-scm-agent/process/ to a directory, and use the same keytab generated by execution time.

 

I read about that the "xst" command can be issue with the "-norandkey" parameter, preventing the principal not change the password. I tried to test this command with "-norandkey" but  i had a privilege problem:

 

kadmin: Operation requires ``extract-keys'' privilege while changing hdfs/FQDN@HADOOP.EMETER.COM's key.

 

My kadm5.acl has full admin rights, as below:

 

more kadm5.acl
*/admin@HADOOP.EMETER.COM *
cloudera-scm/admin@HADOOP.EMETER.COM admilc

 

Do you guys know how to grant this "extract-keys'' privilege ?

 

Thank you very much!

 

Gabre.

avatar
Explorer

@Gabre I am glad that your problem is solved. A couple of things: 1) make sure you are using the MIT implementation of Kerberos. 2) It appears that granting extract priviliges need to be done explicitly for each user. (You can't use wildcard.) Please see this, and note the paragraph that begins with "The extract privilege is not included in the wildcard privilege".

 

I just realized that you are missing the extract privilege in your ACL for the cloudera-scm user. It appears that you need to change the ACL for cloudera admin from admilc to admilce.

 

I hope that helps you. Cheers.

avatar
Explorer

Hello @ramin!

 

Thanks for replying!

 

I tried to change the privilege from admilc to admilce, but it did not work. I searched the web for some kerberos documentation and i found this link below:

https://web.mit.edu/kerberos/krb5-1.12/doc/admin/admin_commands/kadmin_local.html

 

"-norandkeyDo not randomize the keys. The keys and their version numbers stay unchanged. This option is only available in kadmin.local, and cannot be specified in combination with the -eoption."

 

It seems that its only available with kadmin.local, and it does not work with kadmin, i tried it and it worked!

 

kadmin.local:  xst -norandkey -k hdfs.keytab hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM

Entry for principal hdfs/cpsmaaeip04.cpfl.com.br@HADOOP.EMETER.COM with kvno 23, encryption type arcfour-hmac added to keytab WRFILE:hdfs.keytab.
kadmin.local: quit

 

Thanks again for the help that you guys provided!!!

 

Gabre.

 

 

avatar
Explorer

Hi,

 

We are facing similar issue. Not able execute any hadoop commands from the node and services like Datanode, Node manager are not starting on the node.

 

Please find the log below when I executed the command

 

HADOOP_ROOT_LOGGER=TRACE,console HADOOP_JAAS_DEBUG=true HADOOP_OPTS="-Dsun.security.krb5.debug=true" hdfs dfs -ls /

 

Affected node - Seems kdc is connecting using UDP and not TCP

>>> Credentials acquireServiceCreds: same realm
default etypes for default_tgs_enctypes: 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
>>> KdcAccessibility: reset
>>> KrbKdcReq send: kdc=***** UDP:88, timeout=3000, number of retries =3, #bytes=2247
>>> KDCCommunication: kdc=****** UDP:88, timeout=3000,Attempt =1, #bytes=2247
SocketTimeOutException with attempt: 1
>>> KDCCommunication: kdc=****** UDP:88, timeout=3000,Attempt =2, #bytes=2247
SocketTimeOutException with attempt: 2
>>> KDCCommunication: kdc=****** UDP:88, timeout=3000,Attempt =3, #bytes=2247
SocketTimeOutException with attempt: 3
>>> KrbKdcReq send: error trying ******:88
java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:145)

 

Another node from the same rack which is working fine

 

>>> Credentials acquireServiceCreds: same realm
default etypes for default_tgs_enctypes: 23 1 3.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
>>> KdcAccessibility: reset
>>> KrbKdcReq send: kdc=****** UDP:88, timeout=3000, number of retries =3, #bytes=2247
>>> KDCCommunication: kdc=****** UDP:88, timeout=3000,Attempt =1, #bytes=2247
>>> KrbKdcReq send: #bytes read=104
>>> KrbKdcReq send: kdc=****** TCP:88, timeout=3000, number of retries =3, #bytes=2247
>>> KDCCommunication: kdc=****** TCP:88, timeout=3000,Attempt =1, #bytes=2247
>>>DEBUG: TCPClient reading 2722 bytes
>>> KrbKdcReq send: #bytes read=2722
>>> KdcAccessibility: remove ****** :88
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
>>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
Krb5Context setting mySeqNumber to: 83326031
Created InitSecContextToken:

 

We then tried to force krb to use tcp by below method.

 

Added below parameter to krb5.conf but didn't help

 

udp_preference_limit =1

 

Thanks 

Vijith Vijayan

 

avatar
Explorer

I was using a custom krb5.conf and it didn't work. Next I tried to change /etc/krb5.conf and then worked.

Just wondering why it is not possible with UDP?