Created 09-21-2017 01:31 PM
I have kerberized HDP [2.4] cluster using Ambari Rest-API and all services are running fine except Storm, Kafka and ambari-metrics-collector. All keytabs are available and properly placed on respective hosts
From logs what I can understand is Storm, Kafka and ambari-metrics-collector services fail to connect to zkclient or zk quorum.
All zookeeper servers are running fine for a long time and if I do telnet, I am able to connect zk quorum with same port [2181]. So, somewhere I am missing some configuration for these services connecting zookeeper in kerberized environment. [or SASL configurations].
Zookeeper logs
2017-09-21 12:14:47,271 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /128.160.120.21:41906 (no session established for client) 2017-09-21 12:15:43,963 - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x35ea3480411005c 2017-09-21 12:15:47,309 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /128.160.120.21:42000 2017-09-21 12:15:47,310 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:748) 2017-09-21 12:15:47,310 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /128.160.120.21:42000 (no session established for client) 2017-09-21 12:16:47,273 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /128.160.120.21:42122 2017-09-21 12:16:47,275 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:748) 2017-09-21 12:16:47,275 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /128.160.120.21:42122 (no session established for client) ~
Kafka server logs
advertised.listeners = PLAINTEXTSASL://abctestlab0515.bdaas.com:6667 leader.imbalance.per.broker.percentage = 10 (kafka.server.KafkaConfig) [2017-09-21 12:07:30,276] INFO starting (kafka.server.KafkaServer) [2017-09-21 12:07:30,291] INFO Connecting to zookeeper on abctestlab0512.bdaas.com:2181,abctestlab0515.bdaas.com:2181,abctestlab0513.bdaas.com:2181 (kafka.server.KafkaServer) [2017-09-21 12:11:40,363] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 250000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:155) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:129) at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89) at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71) at kafka.server.KafkaServer.initZk(KafkaServer.scala:278) at kafka.server.KafkaServer.startup(KafkaServer.scala:168) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37) at kafka.Kafka$.main(Kafka.scala:67) at kafka.Kafka.main(Kafka.scala) [2017-09-21 12:11:40,364] INFO shutting down (kafka.server.KafkaServer) [2017-09-21 12:11:40,370] INFO shut down completed (kafka.server.KafkaServer) [2017-09-21 12:11:40,370] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable) org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 250000 at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1223) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:155) at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:129) at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:89) at kafka.utils.ZkUtils$.apply(ZkUtils.scala:71) at kafka.server.KafkaServer.initZk(KafkaServer.scala:278) at kafka.server.KafkaServer.startup(KafkaServer.scala:168) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37) at kafka.Kafka$.main(Kafka.scala:67) at kafka.Kafka.main(Kafka.scala) [2017-09-21 12:11:40,372] INFO shutting down (kafka.server.KafkaServer)
Storm- DRPC logs
2017-09-20 14:21:06.114 o.a.s.z.s.ZooKeeperServer [INFO] Server environment:user.dir=/home/hdp44-storm 2017-09-20 14:21:07.304 b.s.u.Utils [INFO] Using defaults.yaml from resources 2017-09-20 14:21:07.324 b.s.u.Utils [INFO] Using storm.yaml from resources 2017-09-20 14:21:07.373 b.s.d.drpc [INFO] Starting Distributed RPC servers... 2017-09-20 14:21:07.450 b.s.s.a.k.ServerCallbackHandler [WARN] No password found for user: null 2017-09-20 14:21:07.452 b.s.s.a.k.KerberosSaslTransportPlugin [ERROR] Server failed to login in principal:javax.security.auth.login.LoginException: No pa ssword pr ovided javax.security.auth.login.LoginException: No password provided at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:919) ~[?:1.8.0_131] at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) ~[?:1.8.0_131] at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[?:1.8.0_131]
Created 09-23-2017 10:03 PM
Created 09-27-2017 08:52 AM
Thanks for your reply. After referring above mentioned documents I am able to start my Storm and Kafka services.The issues were solved by properly configuring the jaas files.
But, Ambari metrics collector still does not start and Hbase master stops soon after starting.
For metrics collector we get following logs repeatedly:
/var/log/ambari-metrics-collector/ambari-metrics-collector.log
2017-09-27 09:43:00,001 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM (not sure what this error indicates)
2017-09-27 09:47:40,572 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=21, retries=35, started=270488 ms ago, cancelled=false, msg=
2017-09-27 09:48:00,659 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=22, retries=35, started=290575 ms ago, cancelled=false, msg=
2017-09-27 09:48:20,693 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=23, retries=35, started=310609 ms ago, cancelled=false, msg=
/var/log/ambari-metrics-collector/hbase-master.log
2017-09-27 09:43:06,975 ERROR [main] master.HMasterCommandLine: Master exiting java.io.IOException: Could not start ZK with 3 ZK servers in local mode deployment. Aborting as clients (e.g. shell) will not be able to find this ZK quorum. at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:175) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2451)
Created 09-27-2017 11:31 AM
@Ajit Sonawane Great!! Good to hear that the documentation helped in solving the Issue.
For the HBase one please create a seperate question