Support Questions

Find answers, ask questions, and share your expertise

Leave Safemode

avatar
Contributor

Hello everyone!

 

My name is Guido, currently I'm facing a problem with a name node that is in safe mode in a test lab.

When I ran the "hdfs dfsadmin -safemode leave" command the result that I got is "Access denied for user my_user_account. Superuser privilege is required".

The cluster is integrated with AD Kerberos protocol and my account can authinticate using kinit command. 

I tried to run "sudo -u hdfs hdfs dfsadmin -safemode leave" command in order to provide the hdfs user credentials but the result was:

 

2017-02-15 12:56:41,747 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "my_name_node_fqdn/my_name_node_ip"; destination host is: "my_name_node_fqdn":8020;

 

Notes:

my_user_account = my user name, for example pepe

my_name_node_fqdn = the fqdn of the name node, for example namenode01.mydomain.com

my_name_node_ip = the ip address of the name node, for example 10.0.0.1

 

I really appreciate your help.

Regards.

 

1 ACCEPTED SOLUTION

avatar
Champion
Are at least two of the JournalNodes up and running?

View solution in original post

9 REPLIES 9

avatar
Champion

@gsalerno

 

It seems kerberos enabled in your cluster and kerberos ticket is missing. 

 

After you login, you have to enter $kinit uid@REALM.COM and enter the kerberos password then try to leave safemode as sudo

 

Thanks

Kumar

avatar
Contributor

Thanks Kumar!

I chose another keytab an it works but when I ran "sudo -u hdfs hdfs dfsadmin -safemode leave" I got this error:

 

safemode: Call From my_name_node_fqdn/my_name_node_ip to my_name_node_fqdn:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused.

 

There is no much information but I'm trying to figure it out.

If I got something else I'll let you know.

Thanks!

avatar
Champion
Something is wrong with the NN process. Restart it and tail the log to see what exception pops up. Also ensure that the process is listening on port 8020.

avatar
Contributor

Hello mbigelow, thanks for your help.

My namenode is not listening on port 8020.

The log is copied below:

 

Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://hdev
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use hdev to access this namenode/service.
Feb 17 14:54:40 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
Feb 17 14:54:41 name_node-m0 namenode: ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
Feb 17 14:54:41 localhost java.lang.IllegalArgumentException: Unable to construct journal, qjournal://name_node-m0.my_domain:8485;name_node-m1.my_domain:8485;hadoop-01.my_domain:8485/hdev
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1607)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:276)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:254)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:787)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:626)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1063)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:767)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:670)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
Feb 17 14:54:41 Caused by: java.lang.reflect.InvocationTargetException
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
Feb 17 14:54:41 localhost at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Feb 17 14:54:41 localhost at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1605)
Feb 17 14:54:41 localhost ... 13 more
Feb 17 14:54:41 Caused by: java.lang.NullPointerException
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:178)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:156)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
Feb 17 14:54:41 localhost ... 18 more
Feb 17 14:54:41 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: #012/************************************************************#012SHUTDOWN_MSG: Shutting down NameNode at name_node-m0.my_domain/XXX:XXX:XXX:XXX#012************************************************************/

 

avatar
Champion
Are at least two of the JournalNodes up and running?

avatar
Contributor

Yes, there are three yournal nodes and at least two are up and running.

 

avatar
Champion
Do all of these resolve correctly from all three nodes?

use this cmd to verify:
python -c "import socket; print socket.getfqdn(); print
socket.gethostbyname(socket.getfqdn())"

name_node-m0.my_domain
name_node-m1.my_domain
hadoop-01.my_domain

avatar
Contributor

Yeah, three nodes resolve correctly the python script.

avatar
Contributor

Finally I can get my cluster up and running! As msbigelow said two of my three JNs were up and running but bad rdeclared in hdfs-site.xml dfs.namenode.shared.edits.dir property.

After change it the namenode service starts!

Now everything apperars to be in order.

I hope my problem could help in this community.

Thanks @saranvisa and @mbigelow!