Support Questions
Find answers, ask questions, and share your expertise

ZooKeeper Service Check failed after cluster kerberization

After we kerberize our cluster, the Zookeep is the only service that starts again. But if we check the Zookeeper service from Ambari with the Service check, the check fails like this:

2017-05-30 15:01:15,780 - File['/var/lib/ambari-agent/tmp/zkSmoke.out'] {'action': ['delete']}
2017-05-30 15:01:15,781 - File['/var/lib/ambari-agent/tmp/zkSmoke.sh'] {'content': StaticFile('zkSmoke.sh'), 'mode': 0755}
2017-05-30 15:01:15,782 - Execute['/var/lib/ambari-agent/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh ambari-qa /usr/hdp/current/zookeeper-client/conf 2181 True /usr/bin/kinit /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-datalake@MYDOMAIN.MYDOMAINROOT.NET /var/lib/ambari-agent/tmp/zkSmoke.out'] {'logoutput': True, 'path': ['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'try_sleep': 5}
zk_node1=hdp-cluster-master1.apollon.mydomain.com
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
    at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703)
    at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591)
    at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363)
    at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
    at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
    at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:698)
    at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591)
    at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363)
    at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
    at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
Running test on host hdp-cluster-master1.apollon.mydomain.com
Connecting to hdp-cluster-master1.apollon.mydomain.com:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: hdp-cluster-master1.apollon.mydomain.com:2181(CONNECTING) 0] get /zk_smoketest

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
8 REPLIES 8

Super Mentor

@Ramon Wartala

Sometimes such issue happens when you do not configure proper FQDN for the hosts so the principals are not correctly generated.

Can you please confirm if you have the correct FQDN configured for all the hosts including this Zookeeper node?

# Following command should show the FQDN configured
hostname -f

.

If the FQDN is not correctly set earlier then you should try setting the FQDN correctly and then regenerate the keytabs again.

I can confirm that I've got the right FQDNs on all of my worker and master nodes

$ hostname -f
hdp-cluster-master1.apollon.mydomain.com

The keytab generation working fine:

30 May 2017 18:26:08,584  INFO [Server Action Executor Worker 687] 
CreateKeytabFilesServerAction:193 - Creating keytab file for 
HTTP/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET 
on host hdp-cluster-master2.apollon.mydomain.com
30 May 2017 18:26:08,639  INFO [Server Action Executor Worker 687] 
CreateKeytabFilesServerAction:193 - Creating keytab file for 
ambari-qa-datalake@MYDOMAIN.MYDOMAINROOT.NET on host 
hdp-cluster-master2.apollon.mydomain.com
30 May 2017 18:26:08,678  INFO [Server Action Executor Worker 687] 
CreateKeytabFilesServerAction:193 - Creating keytab file for 
nn/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET on
 host hdp-cluster-master2.apollon.mydomain.com
30 May 2017 18:26:08,711  INFO [Server Action Executor Worker 687] 
CreateKeytabFilesServerAction:193 - Creating keytab file for 
hdfs-datalake@MYDOMAIN.MYDOMAINROOT.NET on host 
hdp-cluster-master2.apollon.mydomain.com
30 May 2017 18:26:08,751  INFO [Server Action Executor Worker 687] 
CreateKeytabFilesServerAction:193 - Creating keytab file for 
jhs/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET 
on host hdp-cluster-master2.apollon.mydomain.com

Super Mentor

@Ramon Wartala

What principals do you see when you run the following command using "kadmin.local" on the KDC ?

# kadmin.local  -q "listprincs" | grep zookeeper

.

We're using Microsoft Active Directory as KDC. I'm not sure how to check. But if I did the following:

$ sudo su - zookeeper

$ kinit -kt  /etc/security/keytabs/zk.service.keytab zookeeper/hdp-cluster-master1.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET

$ klist

Ticket cache: FILE:/tmp/krb5cc_1002
Default principal: zookeeper/hdp-cluster-master1.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET

Valid starting       Expires              Service principal
05/30/2017 18:52:38  05/31/2017 04:52:38  krbtgt/MYDOMAIN.MYDOMAINROOT.NET@MYDOMAIN.MYDOMAINROOT.NET
    renew until 06/06/2017 18:52:38

Have you been able to resolve your problem? I have the same behavior after kerberization of HDP cluster (Zookeepr is the only service started; the same error message). I have 3 node cluster with Zookeeper and Kafka only.

Thanks!

New Contributor

@Ramon Wartala

Hi Ramon,

I faced the same issue after one week of cluster being down on a HDP 2.6.2. When trying to start it again, all services refuse to start except Zookeeper and the smoke test fails like yours.

I am using a KDC and the zookeeper principals are :

kadmin.local -q "listprincs" | grep zookeeper

zookeeper/ip-10-2-2-136.eu-central-1.compute.internal@HDPBASE
zookeeper/ip-10-2-2-181.eu-central-1.compute.internal@HDPBASE
zookeeper/ip-10-2-2-43.eu-central-1.compute.internal@HDPBASE

The smoke test fails with the following errors:

2017-11-24 09:28:07,807 - File['/var/lib/ambari-agent/tmp/zkSmoke.out'] {'action': ['delete']} 2017-11-24 09:28:07,807 - File['/var/lib/ambari-agent/tmp/zkSmoke.sh'] {'content': StaticFile('zkSmoke.sh'), 'mode': 0755} 2017-11-24 09:28:07,808 - Execute['/var/lib/ambari-agent/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh ambari-qa /usr/hdp/current/zookeeper-client/conf 2181 True /usr/bin/kinit /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-hdpbase@HDPBASE /var/lib/ambari-agent/tmp/zkSmoke.out'] {'logoutput': True, 'path': ['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'try_sleep': 5} zk_node1=ip-10-2-2-136.eu-central-1.compute.internal log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).

I have checked the hostname on all nodes and the hostname returns the short name, while hostname -f returns the fqdn.

I changed the ambari-agent.ini on all node to use a hostname_script that returns hostname -f since I previously had an issue with the ambari heartbeats.

Could you resolve the issue ?

I already tried regenerating keytabs and disabling/reenabling Kerberos without any success.

Best Regards,

Eric Le Blouc'h

New Contributor

The issue was that the reverse dns was not correctly configured and adding all the hosts to /etc/hosts made it work. I still wonder why this was working before the stop, wait one week, start of the cluster VMs. A cache that was hiding the problem ?