- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Namenode not starting after Kerberos setup on a HDP 2.6 cluster.
- Labels:
Apache Ambari
Created ‎04-07-2018 08:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have installed MIT kerberos on one Linux server and through Ambari's automated way we tried to kerberise our dev cluster. Amabri created all the principals for each node[3 datanode,2namenode and one edge node] and i can see them in KDC. While starting all services on last step it failed , Namenode services are not coming up. Before proceeding this on our dev cluster I have done same activities on Sandbox and it worked.
But on cluster there is a slight change,it is HA cluster and for each node we have two IP's , one is external on which we can do ssh and login and other is internal IP for each node for internal communication through infiniband.
2018-04-01 16:19:26,580 - call['hdfs haadmin -ns ABCHADOOP01 -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'} 18/04/01 16:19:28 INFO ipc.Client: Retrying connect to server: c1master02-nn.abc.corp/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From c1master01-nn.abc.corp/ to c1master02-nn.abc.corp:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2018-04-01 16:19:28,783 - call returned (255, '18/04/01 16:19:28 INFO ipc.Client: Retrying connect to server: c1master02-nn.abc.corp/ Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)\nOperation failed: Call From c1master01-nn.abc.corp/ to c1master02-nn.abc.corp:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused') 2018-04-01 16:19:28,783 - NameNode HA states: active_namenodes = [], standby_namenodes = [], unknown_namenodes = [('nn1', 'c1master01-nn.abc.corp:50070'), ('nn2', 'c1master02-nn.abc.corp:50070')] 2018-04-01 16:19:28,783 - Will retry 2 time(s), caught exception: No active NameNode was found.. Sleeping for 5 sec(s) 2018-04-01 16:19:33,787 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpKVcTXy 2>/tmp/tmpy6hgoj''] {'quiet': False} 2018-04-01 16:19:33,837 - call returned (7, '') 2018-04-01 16:19:33,837 - Getting jmx metrics from NN failed. URL: http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output raise ExecutionFailed(err_msg, code, files_output[0], files_output[1]) ExecutionFailed: Execution of 'curl --negotiate -u : -s 'http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpKVcTXy 2>/tmp/tmpy6hgoj' returned 7. 2018-04-01 16:19:33,837 - call['hdfs haadmin -ns ABCHADOOP01 -getServiceState nn1'] {'logoutput': True, 'user': 'hdfs'} Command failed after 1 tries Do not show this dialog again when starting a background operationOK Licensed under the Apache License, Version 2.0. See third-party tools/resources that Ambari uses and their respective authors
-From each node i am able to do kadmin and add list princs. -I have done ssh on Namenode and tried to obtain ticket , it also worked.
abc># kinit -kt /etc/security/keytabs/nn.service.keytab nn/c1master01-nn.abc.corp@ABCHDP.COM abc># klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: nn/c1master01-nn.abc.corp@ABCHDP.COM Valid starting Expires Service principal 04/01/18 16:03:42 04/02/18 16:03:42 krbtgt/ABCHDP.COM@ABCHDP.COM renew until 04/01/18 16:03:42
Since the cluster is empty and tried hadoop namenode -format as well But got below issue:-
java.io.IOException: Login failure for nn/c1master01-nn.abc.corp@ABCHDP.COM from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Receive timed out at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1098) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:307) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1160) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769) Caused by: javax.security.auth.login.LoginException: Receive timed out at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089) ... 4 more Caused by: java.net.SocketTimeoutException: Receive timed out at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143) at java.net.DatagramSocket.receive(DatagramSocket.java:812) at sun.security.krb5.internal.UDPClient.receive(NetClient.java:206) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:411) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364) at java.security.AccessController.doPrivileged(Native Method) at sun.security.krb5.KdcComm.send(KdcComm.java:348) at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253) at sun.security.krb5.KdcComm.send(KdcComm.java:229) at sun.security.krb5.KdcComm.send(KdcComm.java:200) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776) ... 17 more 18/04/01 15:45:03 INFO util.ExitUtil: Exiting with status 1 18/04/01 15:45:03 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at c1master01-nn.abc.corp/
This is the internal IP . Can anybody tell me whats the issue??
Do i need to manually add entry for internal IP's in KDC ??If required why Amabri haven't added it to KDC like it does for external ips??
In case required , since every machine is having only one hostname , why we need two entries??
Created ‎04-08-2018 09:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kerberos KDC listens on both TCP and UDP channel on port 88 (default). By default, the Namenode tries to connect to Kerberos KDC over UDP.
How to force the Kerberos library to use TCP:
1. Go to Ambari UI. Then Services > Kerberos > Configs.
2. In the 'Advanced krb5-conf section, look for 'krb5-conf Template' field. Under [libdefaults] stanza, add 'udp_preference_limit = 1'
3. Save config and restart the affected component.
4. This will force Kerberos to use TCP.
Can you share the output of
# iptables -nvL
If you don't see UDP port 88 add the following
# iptables -I INPUT 5 -p udp --dport 88 -j ACCEPT
Rerun the first command you should now see a line like this
0822 2908K ACCEPT udp -- * * udp dpt:88
Created ‎04-08-2018 09:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kerberos KDC listens on both TCP and UDP channel on port 88 (default). By default, the Namenode tries to connect to Kerberos KDC over UDP.
How to force the Kerberos library to use TCP:
1. Go to Ambari UI. Then Services > Kerberos > Configs.
2. In the 'Advanced krb5-conf section, look for 'krb5-conf Template' field. Under [libdefaults] stanza, add 'udp_preference_limit = 1'
3. Save config and restart the affected component.
4. This will force Kerberos to use TCP.
Can you share the output of
# iptables -nvL
If you don't see UDP port 88 add the following
# iptables -I INPUT 5 -p udp --dport 88 -j ACCEPT
Rerun the first command you should now see a line like this
0822 2908K ACCEPT udp -- * * udp dpt:88
Created ‎04-09-2018 08:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Geoffrey Shelton Okot ..Thanks for the update.
It worked, also want to add one thing that one of my namenode port was occupied by previous running instance[java.net.BindException: Port in use:]and the Ambari
was not showing any message for that , so checked my namenode logs on the server itself.
After killing the old PID and restart did the trick.