Created on
08-26-2019
12:41 AM
- last edited on
08-26-2019
08:54 AM
by
ask_bill_brooks
Hi All,
While running a mapreduce job, I am getting the following exception. Kindly help..
EBUG on 22 Aug 2019 ,06:36:54 com.xxx.xxx.xxx.logger.XMLRPCLogger.log(XMLRPCLogger.java:76) => java.lang.Thread.run(Thread.java:745) => : 19/08/22 06:36:54 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm265
DEBUG on 22 Aug 2019 ,06:36:54 com.xxx.xxx.xxx.logger.XMLRPCLogger.log(XMLRPCLogger.java:76) => java.lang.Thread.run(Thread.java:745) => : 19/08/22 06:36:54 INFO retry.RetryInvocationHandler: Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over rm265 after 1185 fail over attempts. Trying to fail over after sleeping for 1903ms.
DEBUG on 22 Aug 2019 ,06:58:43 com.xxx.xxxx.archive.logger.XMLRPCLogger.log(XMLRPCLogger.java:76) => java.lang.Thread.run(Thread.java:745) => : java.net.ConnectException: Call From abcd/<ipabcd> to pqrs:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
DEBUG on 22 Aug 2019 ,07:01:28 xxx.xxx.xxx.archive.logger.XMLRPCLogger.log(XMLRPCLogger.java:76) => java.lang.Thread.run(Thread.java:745) => : java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "abcd"; destination host is: "xyz":8032;
For your information please see below details and see error logs.
abcd – is our edge node
We have high availability cluster:
pqrs – name node 1
xyz – name node 2
Created 08-26-2019 06:10 AM
Is the cluster running fine? If so has the /etc/hosts on the edge node have entries for you namenodes? Can it resolve the IP's of name node 1 and name node 2.
Your issue looks a connectivity issue. I would usually start with the usual culprits FW, DNS and host entry etc
HTH
Created 08-26-2019 11:12 PM
Dear Shelton,
The cluster is running fine.. /etc/hosts on the edge node does not have entries for namenodes.
I will check whether the ip's are resolving.. Is it because of the connectivity issue that I am getting GSS Exception also....??
Created 08-26-2019 11:41 PM
As we see this error:
Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "abcd"; destination host is: "xyz":8032;
Which can happen due to few reasons like incorrect FQDN/IP Mapping OR if we do not get a valid kerberos ticket due to some reasons. So it will be good to check few things/ Can you please share the following outputs:
1. On the Edge Node do you have the ambari-agent running? Can you please check and share the output of the following commands:
# hostname -f
# hostname
If your cluster nodes are resolving each other based on the "/etc/hosts" file entry (unlike DNS Server entry) then usually we should see the same /etc/hosts file mapping accross the cluster nodes. So pelase validate the same.
# cat /etc/hosts
Also from the Edge node are you able to access the scheduler port?
# telnet xyz 8032
(OR)
# nc -v xyz 8032
2. From the KDC and Ambari Server host you are able to resolve your Edge node using the FQDN correctly? Assuming your Edge node FQDN is "abcd" then from Ambari Server/KDC are you able to resolve it as following?
# ping abcd
3. Do you see keytabs inside the "/etc/security/keytabs" directory? Are you able to get a valid kerberos ticket using keytab?
Example:
# klist -ket /etc/security/keytabs/nm.service.keytab
Keytab name: FILE:/etc/security/keytabs/nm.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
2 08/11/2019 01:58:29 nm/ker1latest4.example.com@EXAMPLE.COM (des-cbc-md5)
2 08/11/2019 01:58:29 nm/ker1latest4.example.com@EXAMPLE.COM (aes256-cts-hmac-sha1-96)
2 08/11/2019 01:58:29 nm/ker1latest4.example.com@EXAMPLE.COM (des3-cbc-sha1)
2 08/11/2019 01:58:29 nm/ker1latest4.example.com@EXAMPLE.COM (arcfour-hmac)
2 08/11/2019 01:58:29 nm/ker1latest4.example.com@EXAMPLE.COM (aes128-cts-hmac-sha1-96)
# kinit -kt /etc/security/keytabs/nm.service.keytab nm/ker1latest4.example.com@EXAMPLE.COM
# klist
Like are you able to do Kinit and can check if you are able to get valid tickets using "klist" ? In your case the Principal name might be different based on your setup of edge node.
.
.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Created 08-26-2019 11:52 PM
The "on connection exception: java.net.ConnectionException: Connection refused" is a network issue and the GSS Exception is a Kerberos one.
Those are 2 different things if you are running the job from the edge node, which user is executing the job? Assuming is a user Dev1 on the edge node can validate this user has a valid Kerberos ticket
Dev1@localhost $ klist
Share the output of the above snippet !!
As you are executing the code from the edge node can you verify that the krb5.conf file on the edge node is identical to the one on the KDC server, this file should be exactly the same on both the edge node and KDC server.
Was the Kerberos client installed on the edge node see below command
yum install krb5-workstation
The user running the job should kinit with his keytab to be able to grab a valid ticket. I recently created a document to answer a similar Kerberos issue on the edge node.
Please check my procedure for creating a user keytab on the edge node to enable him/her excute jobs in a kerberized cluster
https://community.cloudera.com/t5/Support-Questions/HDFS-is-not-accessible-from-an-user-after-kerber...
Hope that help