Community Articles

amcbarnett · ‎11-24-2015

No Valid Credentials Provided Error

Upon Enabling Kerberos in Ambari, some components started (Name Node), but other components that are are failing, such as MapReduce and Hive.

Here is an example of the error output when we try to start these services.

Fail: Execution of 'hadoop fs -mkdir `rpm -q hadoop | grep -q "hadoop-1" || echo "-p"` /app-logs /mapred /mapred/system /mr-history/tmp /mr-history/done && hadoop fs -chmod -R 777 /app-logs && hadoop fs -chmod 777 /mr-history/tmp && hadoop fs -chmod 1777 /mr-history/done && hadoop fs -chown mapred /mapred && hadoop fs -chown hdfs /mapred/system && hadoop fs -chownyarn:hadoop/app-logs && hadoop fs -chownmapred:hadoop/mr-history/tmp /mr-history/done' returned 1. mesg: ttyname: Invalid argument 
15/04/28 16:12:33 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Resolution

Can you do a kinit using hdfs service principal e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM
After the kinit, do a klist and ensure that the expired and renewed date are not the same as the ticketed date.
If the Renew date is in the past or the same as the Ticketed date execute a kinit -R.
Try a hadoop fs -ls command. If successful, try to Restart services in Ambari
If you services do not restart continue below.
Find where your hadoop-env.sh file is located, usually is cd /etc/hadoop/conf.empty
```
find / “name=hadoop-env.sh"
```
Edit hadoop-env.sh ( vi hadoop-env.sh). Add debug param sun.security.krb5.debug=true to HADOOP_OPTS variable, that is,
```
 export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}”
```
Try a kinit as hdfs, then hadoop fs -ls command. Look at the debug statements produced. If it says Keytype =18, then the error is due to wrong JCE policy files.
This can happen when you have AES256 encryption enabled an you recently upgraded java. Upgrading java will overwrite the JCE policy files which include support for AES256 encryption. To fix this simply re-install your JCE policy jars back into "/usr/java/default/jre/lib/security/“ or the JAVA_HOME in your hadoop-env.sh file on each node.
Get the right JCE files: For JDK 8 use http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html. For JDK 7 use http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html

Note:

Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets.

----------------------------------------------------------------

Unknown Password or Unable to Obtain Password for User Error Upon Restarting Hadoop Services

Assuming you can kinit successfully, and can see the ticket cached in the klist, this error occurs for one of several reasons:

IP address in /etc/hosts and IP address for hostname are different
The kerberos prinicipal setting in hdfs-site.xml is wrong. Verify dfs.namenode.kerberos and dfs.datanode.kerberos properties.
Wrong file ownerships and/or permissions on /etc/security/keytabs directory. The problem is that the keytabs were created and owned by local hdfs, hbase, and ambari-qa owners. However these uids are different from the uids of the corresponding Active Directory Users. The files need to be owned by the Active Directories UIDs.

Resolution

Archive and Clear out all logs. For Teradata these are in /var/opt/teradata/log/hadoop/hdfs and /var/opt/teradata/log/hbase. Normally logs are located in /var/log/hadoop-hdfs. The reason being is that the logs would have been created using the local UIDs which would create a problem.
Perform a ls -l on the /etc/security/keytabs directory. Make note of which keytabs are owned by hdfs, hbase, and ambari-qa
Then perform ls -n on /etc/security/keytabs. Make note of the UIDs for hdfs, hbase and ambari-qa.
Take a look at the /etc/passwd file also and note the UIDs for hdfs, hbase, and ambari-qa.
Next perform a touch on a test file. Name it testuid.
Perform a chown hdfs testuid. Note the uid. Do a chown for hbase, and ambari0qa. It would be different from the ones found in /etc/security/keytabs. These are the AD UIDs.
Go back to /etc/security/keytabs
Perform a chown <AD-UID> <keytab>, that is, use the new AD UID found for each of hdfs, hbase, and ambari-qa.
Then perform ls -n on /etc/security/keytabs. Make sure the new AD UIDs for hdfs, hbase and ambari-qa are reflected in the keytabs.
Ensure that your kinits work for hbase, hdfs and amabri-qa e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM

Further Resolution

If the services and components do not restart you need to change the permissions on all files owned by hdfs, hbase and ambari-qa on the ENTIRE cluster.
Modify and run this script(changing the appropriate uids for hdfs, hbase and ambari-qa). Be careful. It changes many files throughout the cluster.

----------------------------------------------------------------

Second Instances of WebHCat and Oozie fails after Kerberos is enabled

Failures occur when two WebHCat servers or two Oozie servers is deployed with Kerberos....

The issue occurs when in Ambari, you use _HOST as the domain name in the WebHCat and Oozie configs for principals since they DO NOT get substituted appropriately when each service starts. An of this example would be using HTTP/_HOST@EXAMPLE.COM or oozie/_HOST@EXAMPLE.COM as principals. Normally this is appropriate since if WebHcat server runs on node 1, this should translate to HTTP/node1.example.com@EXAMPLE.COM or on node 2 to HTTP/node2.example.com@EXAMPLE.COM. Unfortunately this is a bug, as the substitution does not occur.

You would need to go directly to the second instance of each server and manually edit the webhcat-site.xml or oozie-site.xml file with the second nodes principals for spengo and oozie respectfully, that is HTTP/node2.example.com@EXAMPLE.COM and oozie/node2.example.com@EXAMPLE.COM

Unfortunately if you restart or make any changes in Ambari after that, it would push the wrong configurations to the second instances of each. Since you cannot use _HOST you are forced to use node 1 principals which do not work for node 2. Thus it would overwrite the fixes made to get this resolved on second host. Be mindful of this upon restarts by Ambari. Always save your own versions of webhcat-site.xml and oozie-site.xml.

Resolution

WebHcat

WebHcat can only have one value for templeton.kerberos.principal in custom webhcat-site.xml
Normally you would have the _HOST as the domain name in the principal. WebHcat does not resolve _HOST. In Ambari, set the templeton.kerberos.principal to be HTTP/node1.example.com@EXAMPLE.COM, and restart WebHcat.
Log onto node 2 where the second WebHcat server is running and perform the following
1. su hcat
2. edit webhcat-site.xml located in /etc/hive-webhcat/conf
3. Change all principal names from node 1 to node 2
4. export HADOOP_HOME=/usr
5. /usr/lib/hive-catablog/sbin/webhcat-server.sh stop
6. /usr/lib/hive-catablog/sbin/webhcat-server.sh start

Oozie

Oozie can only have one value for the principals in custom oozie-site.xml for properties oozie.authentication.kerberos.principal and oozie.service.HadoopAccessorService.kerberos.principal HTTP/node1.example.com@EXAMPLE.COM and oozie.service.HadoopAccessorService.kerberos.principaloozie/node1.example.com@EXAMPLE.COM, and restart oozie.
Log onto node 2 where the second Oozie server is running and perform the following
1. su oozie
2. edit oozie-site.xml located in /etc/oozie/conf
3. Change all principal names from node 1 to node 2
4. export HADOOP_HOME=/usr
5. /usr/lib/oozie/bin/oozied.shstop
6. /usr/lib/oozie/bin/oozied.sh start

----------------------------------------------------------------

After Enabling Hue for Kerberos and LDAP, the File Browser Errors out

When you log into Hue with an AD account (after configuring for LDAP) you receive the following error:

2015-05-06 09:50:25,698  INFO [][hue:] GETFILESTATUS Proxy user [hue] DoAs user [admin]
2015-05-06 09:50:25,712  WARN [][hue:] GETFILESTATUS FAILED [GET:/v1/user/admin] response [Internal Server Error] SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]

Resolution

With NameNode HA, HTTPFS needs to be configured. If there is no NameNode HA, WebHDFS needs to be configured.
You need to configure hadoop-httpfs to use kerberos. Make changes in httpfs-site.xml on the Hue box to change from simple authentication to kerberos
Edit the/etc/hadoop-httpfs/conf.empty/httpfs-site.xmlfile onHue Node
</property>
<property> <name>httpfs.hadoop.authentication.type</name> <value>simple</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.principal</name> <value>httpfs/huenode.EXAMPLE.com@EXAMPLE.COM</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/httpfs.service.keytab</value> </property> <property> <name>httpfs.authentication.kerberos.name.rules</name> <value>RULE:[rm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[dn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[oozie@.*EXAMPLE.COM)s/.*/oozie/ RULE:[jhs@.*EXAMPLE.COM)s/.*/mapred/
DEFAULT</value> </property> </configuration>
And then restart Hadoop-httpfs…. It appears that we need a keytab for httpfs however.
----------------------------------------------------------------
Where Can I find the commands that Ambari runs for Kerberos

what commands Ambari runs to add the key tabs for AD option and it is no where to be found in the logs... ?
Answer: ktadd in /var/lib/ambari-server/resources/common-services/KERBEROS/package/scripts/kerberos_common.py -> function create_keytab_file

----------------------------------------------------------------

Help. I have long running jobs and my Tokens are expiring leading to Job Failures

Possible Resolution Steps

First stop – NTP. Do a pdsh and reset and restart ntp service on all nodes.
Check the JDK. Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets.

Change the max renewable life and ticket life time

> kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@<REALM.COM>

>klist

Check for the expiration of krbtgt/<REALM.COM>@<REALM.COM> principal.  Is it Seven days or one day?

Look at max_renewable_life in /var/kerberos/krb5kdc/kdc.conf. Is it 7d? 14d?  Is it different from the krbtgt/<REALM.COM>@<REALM.COM> expiration length

Change max_renewable_life in /var/kerberos/krb5kdc/kdc.conf to 14d

Change the principal krbtgt/<REALM.COM>@<REALM.COM> maxrenewlife to renew after the same time as max_renewable_life

If it is MIT kerberos you would have to use kadmin(https://web.mit.edu/kerberos/krb5-1.12/doc/admin/admin_commands/kadmin_local.html and https://blog.godatadriven.com/kerberos_kdc_install.html )  If it is AD as the administrator for the commands

Kadmin –p admin

Then kadmin: modprinc -maxrenewlife "7 days"  krbtgt/<REALM.COM>@<REALM.COM>

What about ticket_lifetime in vi /etc/krb5.conf? Is there a renew_lifetime?  max_life?  

You can change it to be more than 24h and 

restart krb5kdc  service

Double check the chron job. You can find examples to compare vis Google (e.g. http://wiki.grid.auth.gr/wiki/bin/view/Groups/ALL/HowToAutomaticallyRenewKerberosTicketsAndAFSTokens...

amcbarnett · ‎05-20-2016

See also https://github.com/steveloughran/kerberos_and_hadoop/blob/master/sections/errors.md

Cloudera Community

Community Articles

Common Kerberos Errors and Solutions

Apache Ambari

Apache Hadoop

Apache HBase

Apache Hive

Apache Oozie

Apache YARN

Cloudera Hue

HDFS

Kerberos

MapReduce

Security

No Valid Credentials Provided Error

Resolution

Unknown Password or Unable to Obtain Password for User Error Upon Restarting Hadoop Services

Resolution

Further Resolution

Second Instances of WebHCat and Oozie fails after Kerberos is enabled

Resolution

WebHcat

Oozie

After Enabling Hue for Kerberos and LDAP, the File Browser Errors out

Resolution

Where Can I find the commands that Ambari runs for Kerberos

Help. I have long running jobs and my Tokens are expiring leading to Job Failures

Possible Resolution Steps

Re: Common Kerberos Errors and Solutions