Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Kerberos kinit(v5): Credentials cache I/O operation failed XXX when initializing cache

avatar
Expert Contributor

I am getting below alerts:

Services Reporting Alerts CRITICAL [MAPREDUCE2] MAPREDUCE2 CRITICAL History Server Web UI Connection failed to http://abcd_fqdn.com:19888 (Execution of '/usr/share/centrifydc/kerberos/bin/kinit -l 5m -c /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 -kt /etc/security/keytabs/spnego.service.keytab HTTP/

abcd_fqdn.com@Realm.COM > /dev/null' returned 1. kinit(v5): Credentials cache I/O operation failed XXX when initializing cache /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 )

I am not sure which credentials cache is it referring here. I can see credentials cache file on the node:

-rw------- 1 yarn hadoop 1547 Jul 13 11:54 /tmp/krb5cc_513 -rw------- 1 hcat hadoop 1417 Jul 22 12:23 /tmp/krb5cc_516 -rw------- 1 hdfs hadoop 2775 Jul 22 12:24 /tmp/krb5cc_511 -rw------- 1 oozie hadoop 3046 Jul 22 12:24 /tmp/krb5cc_504 -rw------- 1 ambari-qa hadoop 1456 Jul 26 02:48 /tmp/krb5cc_1002

space in /tmp is also available.

9 REPLIES 9

avatar
@Anshul Sisodia

How were the krb5.conf files created? Did Ambari create/modify them or are they managed manually?

Check to see if the krb5.conf file has any of the following attribute specified:

  • ccache_type
  • default_ccache_name

Typically these values are not set so that the infrastructure default values are used.

Sometimes we see the default_ccache_name specify a KEYRING rather than a file. This has historically not been supported by the Hadoop services.

If there is no value for default_ccache_name, try setting it to "/tmp/krb5cc_%{uid}". For example:

default_ccache_name = /tmp/krb5cc_%{uid}

avatar
Expert Contributor

How to confirm who manages Krb5.conf file. The ownership is with root:

-rw-r--r-- 1 root root 727 May 11 17:00 krb5.conf

  • ccache_type = 3
  • default_ccache_name is not defined

Do I need to set this parameter for a particular UID or simply I give for all users

default_ccache_name = /tmp/krb5cc_%{uid} and save conf file?

avatar

If Ambari was managing the krb5.conf file, then the "Manage Kerberos client krb5.conf" checkbox will be checked in the Kerberos service configuration screen - probably under "Advanced krb5.conf". By default this will be checked.

The fact that ccache_type is defined indicates that Ambari is probably not managing the krb5.conf file, however it could be that Ambari is, but maybe Centrify is also trying to manage it. The default value of ccache_type is 4. I am not srue what 3 is, but it indicates an older version of the cache format. I am not sure if this is causing your issue.

Re-reading your issue... to list the contents of the cache file, you can do

klist /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52

However being that there was an I/O issue with that file, it does not exist - as seen in the listing you provided:

-rw------- 1 yarn hadoop 1547 Jul 13 11:54 /tmp/krb5cc_513 
-rw------- 1 hcat hadoop 1417 Jul 22 12:23 /tmp/krb5cc_516 
-rw------- 1 hdfs hadoop 2775 Jul 22 12:24 /tmp/krb5cc_511 
-rw------- 1 oozie hadoop 3046 Jul 22 12:24 /tmp/krb5cc_504 
-rw------- 1 ambari-qa hadoop 1456 Jul 26 02:48 /tmp/krb5cc_1002
-rw------- 1 hdfs hadoop 2775 Jul 22 12:24 /tmp/krb5cc_511 
-rw------- 1 oozie hadoop 3046 Jul 22 12:24 /tmp/krb5cc_504 
-rw------- 1 ambari-qa hadoop 1456 Jul 26 02:48 /tmp/krb5cc_1002

Also, looking at the listing, it appears the file caches are being created.

What happen if you execute the following when logged in as root and then as the same user that the Ambari agents run as (changing the host and realm to match your installation):

/usr/share/centrifydc/kerberos/bin/kinit -l 5m -c /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 -kt /etc/security/keytabs/spnego.service.keytab HTTP/abcd_fqdn.com@Realm.COM

Note: to get the principal(s) listed in a keytab file you can do:

klist -kte /etc/security/keytabs/spnego.service.keytab

avatar
Expert Contributor
  • when I do klist, It shows below output:

klist /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 Ticket cache: FILE:/var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 Default principal: HTTP/abcd_fqdn@REALM.COM Valid starting Expires Service principal 07/26/16 08:42:28 07/26/16 08:47:28 krbtgt/REALM.COM@REALM.COM

  • When I run below command it just execute and didn't give any error in the output
/usr/share/centrifydc/kerberos/bin/kinit -l 5m -c /var/lib/ambari-agent/tmp/web_alert_cc_4610ec9b283dcfc90bc6df1e519e1c52 -kt /etc/security/keytabs/spnego.service.keytab HTTP/abcd_fqdn.com@REALM.COM

avatar

Then it appears that things are looking pretty good.

Is the alert still appearing? It may have been a hiccup where multiple threads were attempting to refresh that cache at the same time. I think in older versions of Ambari this was a problem every-so-often.

What version of Ambari are you using?

avatar
Expert Contributor

Ambari version is 2.1.2

avatar
Super Collaborator

It could be https://issues.apache.org/jira/browse/AMBARI-14847, a problem with concurrent kinit calls. This was fixed in Ambari 2.2.2 - which version of Ambari is this?

avatar
Expert Contributor

Ambari version is 2.1.2

avatar
Super Collaborator

So, it sounds like AMBARI-14847 is the problem. If you upgrade to Ambari 2.2.2, that should resolve it.