Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Enable Kerberos namenode start failed

avatar
Contributor

16/12/28 10:45:26 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.setSafeMode over null. Not retrying because try once and fail.

java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for hdfs-hdpcluster@EXAMPLE.COM to bigdata013.example.com/<ip-address>:8020; Host Details : local host is: "bigdata013.example.com/<ip-address>"; destination host is: "bigdata013.example.com":8020; 
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1556)
	at org.apache.hadoop.ipc.Client.call(Client.java:1496)
	at org.apache.hadoop.ipc.Client.call(Client.java:1396)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:711)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
	at com.sun.proxy.$Proxy11.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2657)
	at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1340)
	at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1324)
	at org.apache.hadoop.hdfs.tools.DFSAdmin.setSafeMode(DFSAdmin.java:611)
	at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:1916)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2107)
Caused by: java.io.IOException: Couldn't setup connection for hdfs-hdpcluster@EXAMPLE.COM to bigdata013.example.com/<ip-address>:8020
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:712)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:683)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:770)
	at org.apache.hadoop.ipc.Client$Connection.access$3200(Client.java:397)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1618)
	at org.apache.hadoop.ipc.Client.call(Client.java:1449)
	... 20 more
Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:595)
	at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:397)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:762)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:758)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:757)
	... 23 more
safemode: Failed on local exception: java.io.IOException: Couldn't setup connection for hdfs-hdpcluster@EXAMPLE.COM to bigdata013.example.com/<ip-address>:8020; Host Details : local host is: "bigdata013.example.com/<ip-address>"; destination host is: "bigdata013.example.com":8020; 
16/12/28 10:45:40 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/12/28 10:45:43 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/12/28 10:45:44 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
16/12/28 10:45:48 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Zhao Chaofeng

Is this thread still open? i.e hasn't this problem been resolved?

Please revert

View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Zhao Chaofeng

As this issue is basically related to "GSS initiate failed", Hence can you please check if you see a valid ticket and if you are able to do a "kinit" manually?

Also are you using Sun JDK? If yes then you will have to install the JCE policies for encryption.

Please check the below link which says "Before enabling Kerberos in the cluster, you must deploy the Java Cryptography Extension (JCE) security policy files on the Ambari Server and on all hosts in the cluster."

: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-security/content/installing_the_j...

avatar
Contributor

Yes, I have installed JCE manually. And I execute "kinit" command to test ticket, the result is OK.

I have a question that whether KDC and ambari-server are in the same host, is it OK?

avatar
Master Mentor

Yes, It is fine. KDC and Ambari can be co located on the same host. Or they can be remotely located as well.

I have a setup where i am running KDC and Ambari on the same host without any issue so far.

avatar
Master Mentor

@Zhao Chaofeng

Since how long you are facing this issue? I mean is there a recent change happened? Or any recent upgrade?

What is your HDP and Ambari Version?

Only the NameNode is failing with the mentioned "GSS initiate failed", or few other components are also falling with the same issue.

I am sure that the Hostname is correct (example: hostname -f). Still it is worth to check.

- Is this kind of issue happening on the same host "bigdata013.example.com/<ip-address>:8020" ? Is this the only host (and the components hosted in this host) are giving the "GSS initiate failed" ? Or other hosts of your cluster are also having this issue? - Worth to check the hostname & KDC Connectivity.

avatar
Contributor

No, I install HDP and ambari a minute ago. After installed, I "Enable Kerberos" and I face this issue.

HDP version: HDP-2.5.0.0

ambari version: Version 2.4.1.0

Of course, all service countered this issue.

I see your reply answer in my another question. After I install JCE, I encouter 'App Timeline Server start failed'.

The log is:

  1. File"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 155,in<module>
  2. ApplicationTimelineServer().execute()
  3. File"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280,in execute
  4. method(env)
  5. File"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 44,in start
  6. self.configure(env)# FOR SECURITY
  7. File"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", line 55,in configure
  8. yarn(name='apptimelineserver')
  9. File"/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89,in thunk
  10. return fn(*args,**kwargs)
  11. File"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 337,in yarn
  12. mode=0755
  13. File"/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155,in __init__
  14. self.env.run()
  15. File"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160,in run
  16. self.run_action(resource, action)
  17. File"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124,in run_action
  18. provider_action()
  19. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459,in action_create_on_execute
  20. self.action_delayed("create")
  21. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456,in action_delayed
  22. self.get_hdfs_resource_executor().action_delayed(action_name,self)
  23. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 247,in action_delayed
  24. self._assert_valid()
  25. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 231,in _assert_valid
  26. self.target_status =self._get_file_status(target)
  27. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 292,in _get_file_status
  28. list_status =self.util.run_command(target,'GETFILESTATUS', method='GET', ignore_status_codes=['404'], assertable_result=False)
  29. File"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 192,in run_command
  30. raiseFail(err_msg)
  31. resource_management.core.exceptions.Fail:Execution of 'curl -sS -L -w '%{http_code}' -X GET --negotiate -u : 'http://bigdata013.example.com:50070/webhdfs/v1/ats/done?op=GETFILESTATUS&user.name=hdfs'' returned status_code=403.
  32. <html>
  33. <head>
  34. <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
  35. <title>Error403GSSException:Failure unspecified at GSS-API level (Mechanism level:Encryption type AES256 CTS mode with HMAC SHA1-96isnot supported/enabled)</title>
  36. </head>
  37. <body><h2>HTTP ERROR 403</h2>
  38. <p>Problem accessing /webhdfs/v1/ats/done.Reason:
  39. <pre>GSSException:Failure unspecified at GSS-API level (Mechanism level:Encryption type AES256 CTS mode with HMAC SHA1-96isnot supported/enabled)</pre></p><hr /><i><small>PoweredbyJetty://</small></i><br/>
  40. <br/>
  41. <br/>
  42. <br/>
  43. <br/>
  44. <br/>
  45. <br/>
  46. <br/>
  47. <br/>
  48. <br/>
  49. <br/>
  50. <br/>
  51. <br/>
  52. <br/>
  53. <br/>
  54. <br/>
  55. <br/>
  56. <br/>
  57. <br/>
  58. <br/>
  59. </body>
  60. </html>

avatar
Master Mentor

@Zhao Chaofeng

- Please check if the "Reverse lookup" is correct or not on that host?

- Also it will be best if you can share the output of to see the kerberos related JAVA options used.

 ps -ef | grep AmbariServer
 ps -ef | grep NameNode

- Just to verify the correct JAVA Path which has the JCE.

.

avatar
Contributor

NameNode is in safe mode, and it can not up.

avatar
Master Mentor

@Zhao Chaofeng

Also can you please share the output of the following command "klist -e -k /etc/security/keytabs/hdfs.headless.keytab"

to see the encryption types used by the kerberos tickets?

# klist -e -k /etc/security/keytabs/hdfs.headless.keytab
Keytab name: FILE:/etc/security/keytabs/hdfs.headless.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   4 hdfs-JoyCluster@EXAMPLE.COM (des3-cbc-sha1) 
   4 hdfs-JoyCluster@EXAMPLE.COM (arcfour-hmac) 
   4 hdfs-JoyCluster@EXAMPLE.COM (aes128-cts-hmac-sha1-96) 
   4 hdfs-JoyCluster@EXAMPLE.COM (des-cbc-md5) 
   4 hdfs-JoyCluster@EXAMPLE.COM (aes256-cts-hmac-sha1-96) 

.

avatar
Contributor

Keytab name: FILE:/etc/security/keytabs/hdfs.headless.keytab

KVNO Principal

---- --------------------------------------------------------------------------

1 hdfs-hdpcluster@EXAMPLE.COM (des3-cbc-sha1)

1 hdfs-hdpcluster@EXAMPLE.COM (aes256-cts-hmac-sha1-96)

1 hdfs-hdpcluster@EXAMPLE.COM (arcfour-hmac)

1 hdfs-hdpcluster@EXAMPLE.COM (aes128-cts-hmac-sha1-96) 1 hdfs-hdpcluster@EXAMPLE.COM (des-cbc-md5)