Created 05-30-2017 01:30 PM
After we kerberize our cluster, the Zookeep is the only service that starts again. But if we check the Zookeeper service from Ambari with the Service check, the check fails like this:
2017-05-30 15:01:15,780 - File['/var/lib/ambari-agent/tmp/zkSmoke.out'] {'action': ['delete']} 2017-05-30 15:01:15,781 - File['/var/lib/ambari-agent/tmp/zkSmoke.sh'] {'content': StaticFile('zkSmoke.sh'), 'mode': 0755} 2017-05-30 15:01:15,782 - Execute['/var/lib/ambari-agent/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh ambari-qa /usr/hdp/current/zookeeper-client/conf 2181 True /usr/bin/kinit /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-datalake@MYDOMAIN.MYDOMAINROOT.NET /var/lib/ambari-agent/tmp/zkSmoke.out'] {'logoutput': True, 'path': ['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'try_sleep': 5} zk_node1=hdp-cluster-master1.apollon.mydomain.com log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:698) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) Running test on host hdp-cluster-master1.apollon.mydomain.com Connecting to hdp-cluster-master1.apollon.mydomain.com:2181 log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Welcome to ZooKeeper! JLine support is enabled [zk: hdp-cluster-master1.apollon.mydomain.com:2181(CONNECTING) 0] get /zk_smoketest WATCHER:: WatchedEvent state:SyncConnected type:None path:null
Created 05-30-2017 04:18 PM
Sometimes such issue happens when you do not configure proper FQDN for the hosts so the principals are not correctly generated.
Can you please confirm if you have the correct FQDN configured for all the hosts including this Zookeeper node?
# Following command should show the FQDN configured hostname -f
.
If the FQDN is not correctly set earlier then you should try setting the FQDN correctly and then regenerate the keytabs again.
Created 05-30-2017 04:23 PM
I can confirm that I've got the right FQDNs on all of my worker and master nodes
$ hostname -f hdp-cluster-master1.apollon.mydomain.com
Created 05-30-2017 04:28 PM
The keytab generation working fine:
30 May 2017 18:26:08,584 INFO [Server Action Executor Worker 687] CreateKeytabFilesServerAction:193 - Creating keytab file for HTTP/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET on host hdp-cluster-master2.apollon.mydomain.com 30 May 2017 18:26:08,639 INFO [Server Action Executor Worker 687] CreateKeytabFilesServerAction:193 - Creating keytab file for ambari-qa-datalake@MYDOMAIN.MYDOMAINROOT.NET on host hdp-cluster-master2.apollon.mydomain.com 30 May 2017 18:26:08,678 INFO [Server Action Executor Worker 687] CreateKeytabFilesServerAction:193 - Creating keytab file for nn/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET on host hdp-cluster-master2.apollon.mydomain.com 30 May 2017 18:26:08,711 INFO [Server Action Executor Worker 687] CreateKeytabFilesServerAction:193 - Creating keytab file for hdfs-datalake@MYDOMAIN.MYDOMAINROOT.NET on host hdp-cluster-master2.apollon.mydomain.com 30 May 2017 18:26:08,751 INFO [Server Action Executor Worker 687] CreateKeytabFilesServerAction:193 - Creating keytab file for jhs/hdp-cluster-master2.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET on host hdp-cluster-master2.apollon.mydomain.com
Created 05-30-2017 04:39 PM
What principals do you see when you run the following command using "kadmin.local" on the KDC ?
# kadmin.local -q "listprincs" | grep zookeeper
Created 05-30-2017 04:54 PM
We're using Microsoft Active Directory as KDC. I'm not sure how to check. But if I did the following:
$ sudo su - zookeeper $ kinit -kt /etc/security/keytabs/zk.service.keytab zookeeper/hdp-cluster-master1.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET $ klist Ticket cache: FILE:/tmp/krb5cc_1002 Default principal: zookeeper/hdp-cluster-master1.apollon.mydomain.com@MYDOMAIN.MYDOMAINROOT.NET Valid starting Expires Service principal 05/30/2017 18:52:38 05/31/2017 04:52:38 krbtgt/MYDOMAIN.MYDOMAINROOT.NET@MYDOMAIN.MYDOMAINROOT.NET renew until 06/06/2017 18:52:38
Created 06-15-2017 02:29 PM
Have you been able to resolve your problem? I have the same behavior after kerberization of HDP cluster (Zookeepr is the only service started; the same error message). I have 3 node cluster with Zookeeper and Kafka only.
Thanks!
Created 11-24-2017 08:04 PM
Hi Ramon,
I faced the same issue after one week of cluster being down on a HDP 2.6.2. When trying to start it again, all services refuse to start except Zookeeper and the smoke test fails like yours.
I am using a KDC and the zookeeper principals are :
kadmin.local -q "listprincs" | grep zookeeper
zookeeper/ip-10-2-2-136.eu-central-1.compute.internal@HDPBASE
zookeeper/ip-10-2-2-181.eu-central-1.compute.internal@HDPBASE
zookeeper/ip-10-2-2-43.eu-central-1.compute.internal@HDPBASE
The smoke test fails with the following errors:
2017-11-24 09:28:07,807 - File['/var/lib/ambari-agent/tmp/zkSmoke.out'] {'action': ['delete']} 2017-11-24 09:28:07,807 - File['/var/lib/ambari-agent/tmp/zkSmoke.sh'] {'content': StaticFile('zkSmoke.sh'), 'mode': 0755} 2017-11-24 09:28:07,808 - Execute['/var/lib/ambari-agent/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh ambari-qa /usr/hdp/current/zookeeper-client/conf 2181 True /usr/bin/kinit /etc/security/keytabs/smokeuser.headless.keytab ambari-qa-hdpbase@HDPBASE /var/lib/ambari-agent/tmp/zkSmoke.out'] {'logoutput': True, 'path': ['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'try_sleep': 5} zk_node1=ip-10-2-2-136.eu-central-1.compute.internal log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /zk_smoketest at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:591) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:363) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
I have checked the hostname on all nodes and the hostname returns the short name, while hostname -f returns the fqdn.
I changed the ambari-agent.ini on all node to use a hostname_script that returns hostname -f since I previously had an issue with the ambari heartbeats.
Could you resolve the issue ?
I already tried regenerating keytabs and disabling/reenabling Kerberos without any success.
Best Regards,
Eric Le Blouc'h
Created 11-24-2017 08:04 PM
The issue was that the reverse dns was not correctly configured and adding all the hosts to /etc/hosts made it work. I still wonder why this was working before the stop, wait one week, start of the cluster VMs. A cache that was hiding the problem ?