Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Leave Safemode

SOLVED Go to solution

Leave Safemode

Contributor

Hello everyone!

 

My name is Guido, currently I'm facing a problem with a name node that is in safe mode in a test lab.

When I ran the "hdfs dfsadmin -safemode leave" command the result that I got is "Access denied for user my_user_account. Superuser privilege is required".

The cluster is integrated with AD Kerberos protocol and my account can authinticate using kinit command. 

I tried to run "sudo -u hdfs hdfs dfsadmin -safemode leave" command in order to provide the hdfs user credentials but the result was:

 

2017-02-15 12:56:41,747 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2017-02-15 12:56:41,749 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
safemode: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "my_name_node_fqdn/my_name_node_ip"; destination host is: "my_name_node_fqdn":8020;

 

Notes:

my_user_account = my user name, for example pepe

my_name_node_fqdn = the fqdn of the name node, for example namenode01.mydomain.com

my_name_node_ip = the ip address of the name node, for example 10.0.0.1

 

I really appreciate your help.

Regards.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Leave Safemode

Champion
Are at least two of the JournalNodes up and running?
9 REPLIES 9

Re: Leave Safemode

Champion

@gsalerno

 

It seems kerberos enabled in your cluster and kerberos ticket is missing. 

 

After you login, you have to enter $kinit uid@REALM.COM and enter the kerberos password then try to leave safemode as sudo

 

Thanks

Kumar

Re: Leave Safemode

Contributor

Thanks Kumar!

I chose another keytab an it works but when I ran "sudo -u hdfs hdfs dfsadmin -safemode leave" I got this error:

 

safemode: Call From my_name_node_fqdn/my_name_node_ip to my_name_node_fqdn:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused.

 

There is no much information but I'm trying to figure it out.

If I got something else I'll let you know.

Thanks!

Re: Leave Safemode

Champion
Something is wrong with the NN process. Restart it and tail the log to see what exception pops up. Also ensure that the process is listening on port 8020.
Highlighted

Re: Leave Safemode

Contributor

Hello mbigelow, thanks for your help.

My namenode is not listening on port 8020.

The log is copied below:

 

Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Feb 17 14:54:36 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://hdev
Feb 17 14:54:37 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use hdev to access this namenode/service.
Feb 17 14:54:40 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
Feb 17 14:54:41 name_node-m0 namenode: ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
Feb 17 14:54:41 localhost java.lang.IllegalArgumentException: Unable to construct journal, qjournal://name_node-m0.my_domain:8485;name_node-m1.my_domain:8485;hadoop-01.my_domain:8485/hdev
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1607)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:276)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:254)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:787)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:626)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1063)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:767)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:670)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
Feb 17 14:54:41 Caused by: java.lang.reflect.InvocationTargetException
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Feb 17 14:54:41 localhost at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
Feb 17 14:54:41 localhost at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Feb 17 14:54:41 localhost at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1605)
Feb 17 14:54:41 localhost ... 13 more
Feb 17 14:54:41 Caused by: java.lang.NullPointerException
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.<init>(IPCLoggerChannel.java:178)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:156)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:367)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:116)
Feb 17 14:54:41 localhost at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
Feb 17 14:54:41 localhost ... 18 more
Feb 17 14:54:41 name_node-m0 namenode: INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: #012/************************************************************#012SHUTDOWN_MSG: Shutting down NameNode at name_node-m0.my_domain/XXX:XXX:XXX:XXX#012************************************************************/

 

Re: Leave Safemode

Champion
Are at least two of the JournalNodes up and running?

Re: Leave Safemode

Contributor

Yes, there are three yournal nodes and at least two are up and running.

 

Re: Leave Safemode

Champion
Do all of these resolve correctly from all three nodes?

use this cmd to verify:
python -c "import socket; print socket.getfqdn(); print
socket.gethostbyname(socket.getfqdn())"

name_node-m0.my_domain
name_node-m1.my_domain
hadoop-01.my_domain

Re: Leave Safemode

Contributor

Yeah, three nodes resolve correctly the python script.

Re: Leave Safemode

Contributor

Finally I can get my cluster up and running! As msbigelow said two of my three JNs were up and running but bad rdeclared in hdfs-site.xml dfs.namenode.shared.edits.dir property.

After change it the namenode service starts!

Now everything apperars to be in order.

I hope my problem could help in this community.

Thanks @saranvisa and @mbigelow!