Created on 02-15-2017 05:58 AM - edited 09-16-2022 04:05 AM
Hello everyone!
My name is Guido, currently I'm facing a problem with a name node that is in safe mode in a test lab.
When I ran the "hdfs dfsadmin -safemode leave" command the result that I got is "Access denied for user my_user_account. Superuser privilege is required".
The cluster is integrated with AD Kerberos protocol and my account can authinticate using kinit command.
I tried to run "sudo -u hdfs hdfs dfsadmin -safemode leave" command in order to provide the hdfs user credentials but the result was:
2017-02-15 12:56:41,747 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "my_name_node_fqdn/my_name_node_ip"; destination host is: "my_name_node_fqdn":8020;
Notes:
my_user_account = my user name, for example pepe
my_name_node_fqdn = the fqdn of the name node, for example namenode01.mydomain.com
my_name_node_ip = the ip address of the name node, for example 10.0.0.1
I really appreciate your help.
Regards.
Created 02-17-2017 11:18 AM
Created 02-15-2017 07:01 AM
It seems kerberos enabled in your cluster and kerberos ticket is missing.
After you login, you have to enter $kinit uid@REALM.COM and enter the kerberos password then try to leave safemode as sudo
Thanks
Kumar
Created 02-15-2017 08:48 AM
Thanks Kumar!
I chose another keytab an it works but when I ran "sudo -u hdfs hdfs dfsadmin -safemode leave" I got this error:
safemode: Call From my_name_node_fqdn/my_name_node_ip to my_name_node_fqdn:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused.
There is no much information but I'm trying to figure it out.
If I got something else I'll let you know.
Thanks!
Created 02-17-2017 12:14 AM
Created on 02-17-2017 07:00 AM - edited 02-17-2017 07:10 AM
Hello mbigelow, thanks for your help.
My namenode is not listening on port 8020.
The log is copied below:
Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://hdev
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use hdev to access this namenode/service.
Feb 17 14:54:40 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
Feb 17 14:54:41 name_node-m0 namenode: ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
Feb 17 14:54:41 localhost java.lang.IllegalArgumentException: Unable to construct journal, qjournal://name_node-m0.my_domain:8485;name_node-m1.my_domain:8485;hadoop-01.my_domain:8485/hdev
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1607)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:276)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:254)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:787)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:626)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1063)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:767)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:670)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
Feb 17 14:54:41 Caused by: java.lang.reflect.InvocationTargetException
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
Feb 17 14:54:41 localhost at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Feb 17 14:54:41 localhost at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1605)
Feb 17 14:54:41 localhost ... 13 more
Feb 17 14:54:41 Caused by: java.lang.NullPointerException
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:178)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:156)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
Feb 17 14:54:41 localhost ... 18 more
Feb 17 14:54:41 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: #012/************************************************************#012SHUTDOWN_MSG: Shutting down NameNode at name_node-m0.my_domain/XXX:XXX:XXX:XXX#012************************************************************/
Created 02-17-2017 11:18 AM
Created 02-17-2017 11:25 AM
Yes, there are three yournal nodes and at least two are up and running.
Created 02-17-2017 11:29 AM
Created 02-20-2017 03:39 AM
Yeah, three nodes resolve correctly the python script.
Created 02-20-2017 06:01 AM
Finally I can get my cluster up and running! As msbigelow said two of my three JNs were up and running but bad rdeclared in hdfs-site.xml dfs.namenode.shared.edits.dir property.
After change it the namenode service starts!
Now everything apperars to be in order.
I hope my problem could help in this community.
Thanks @saranvisa and @mbigelow!