Created 10-25-2018 07:30 PM
I have just installed HDP 3.0.1 via Ambari. Most services started without a glitch, but Hive doesn't. Here the error message:
Traceback (most recent call last): File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/decorator.py", line 54, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/scripts/hive_service.py", line 189, in wait_for_znode raise Fail(format("ZooKeeper node /{hive_server2_zookeeper_namespace} is not ready yet")) Fail: ZooKeeper node /hiveserver2 is not ready yet The above exception was the cause of the following exception: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/scripts/hive_server.py", line 143, in <module> HiveServer().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 351, in execute method(env) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/scripts/hive_server.py", line 53, in start hive_service('hiveserver2', action = 'start', upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/scripts/hive_service.py", line 101, in hive_service wait_for_znode() File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/decorator.py", line 62, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/HIVE/package/scripts/hive_service.py", line 189, in wait_for_znode raise Fail(format("ZooKeeper node /{hive_server2_zookeeper_namespace} is not ready yet")) resource_management.core.exceptions.Fail: ZooKeeper node /hiveserver2 is not ready yet
Everything runs on a single server. ZooKeeper is up, but the node in question is not there. I wonder if anyone has stumbled upon the same issue? What was the cause and fix?
Thanks in advance.
Created 10-25-2018 11:51 PM
It might happen if your Zookeepers are not healthy (not running) or showing some errors in their logs.
Basically ambari will do the following test to verify zookeeper znodeinfo and to grep the "serviceUri" config
# /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server zookeeper1.example.com:2181,zookeeper2.example.com:2181,zookeeper3.example.com:2181 ls /hiveserver2 | grep 'serverUri='
.
So please check and share your Zookeeper logs and if you are able to run the above command fine and do you see the output serverUri ?
Created 10-26-2018 02:25 PM
Anything specific we should be scanning the ZK logs for? I've just cleaned them up, and made another attempt to restart the HiveServer2 in Ambari. I attach an excerpt from zookeeper.log around that time. Messages like those repeat there over and over.
Created 10-26-2018 11:01 AM
Here is the command and output:
/usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181 ls /hiveserver2 | grep 'serverUri=' Node does not exist: /hiveserver2
Created 10-28-2018 06:30 PM
By clean-up, you mean just deleted contents of zookeper logs ? or cleaned up the "hiveserver2" znode ?
By the looks (from the log snippet you posted above, the "hiveserver2" znode might not have been created, can you login to zookeeper cli and check:
/usr/hdp/current/zookeeper-client/bin/zkCli.sh -server <zookeeper server host name>:2181, and then do a "ls /" on the zopokeeper cli. It should list a "hiverserver2" node there.
If missing, try to create it [launch zookeeper cli as hive user (do "sudo su - hive")], and then restart hiveserver2.
If this is a secured cluster, you should check for kerberose related errors in the log (could be auth token related issue).
Created 10-29-2018 11:02 AM
I cleaned up the ZK logs before restarting Hive. As for the /hiveserver2 node, it isn't created - I have been saying this all along. Here is the output of the 'ls /' command in the ZK client:
WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls / [cluster, registry, controller, brokers, storm, zookeeper, infra-solr, hbase-unsecure, tracers, admin, isr_change_notification, log_dir_event_notification, accumulo, controller_epoch, druid, rmstore, atsv2-hbase-unsecure, consumers, ambari-metrics-cluster, latest_producer_id_block, config]
When you say 'restart hiveserver2', do you mean in Ambari, or from the command line on the server?
Created 11-01-2018 07:35 AM
I don't know if this is something related, but here is a warning message from hiveserver2.log that repeats multiple times:
2018-10-31T14:41:10,092 INFO [main]: metastore.HiveMetaStoreClient (:()) - Trying to connect to metastore with URI thrift://cc0dbtrishdp.cc.nih.gov:9083 2018-10-31T14:41:10,092 INFO [main]: metastore.HiveMetaStoreClient (:()) - Opened a connection to metastore, current connections: 1 2018-10-31T14:41:10,093 INFO [main]: metastore.HiveMetaStoreClient (:()) - Connected to metastore. 2018-10-31T14:41:10,093 INFO [main]: metastore.RetryingMetaStoreClient (:()) - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hive (auth:SIMPLE) retries=24 delay=5 lifetime=0 2018-10-31T14:41:10,103 WARN [main]: metastore.RetryingMetaStoreClient (:()) - MetaStoreClient lost connection. Attempting to reconnect (1 of 24) after 5s. getCurrentNotificationEventId org.apache.thrift.TApplicationException: Internal error processing get_current_notificationEventId at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5786) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5774) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2878) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at com.sun.proxy.$Proxy57.getCurrentNotificationEventId(Unknown Source) ~[?:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2934) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at com.sun.proxy.$Proxy57.getCurrentNotificationEventId(Unknown Source) ~[?:?] at org.apache.hadoop.hive.ql.metadata.events.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:74) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.<init>(NotificationEventPoll.java:100) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:56) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:271) ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1072) ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hive.service.server.HiveServer2.access$1700(HiveServer2.java:135) ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1341) ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1185) ~[hive-service-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112] at org.apache.hadoop.util.RunJar.run(RunJar.java:318) ~[hadoop-common-3.1.1.3.0.1.0-187.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:232) ~[hadoop-common-3.1.1.3.0.1.0-187.jar:?] 2018-10-31T14:41:10,154 INFO [pool-8-thread-1]: tez.TezSessionState (:()) - User of session id 3599f619-9ae3-4c1e-a17e-085a91ac6f25 is hive
Just in case, I also checked if Hive ports (10000 and 9999) had been taken over by some other process, and they weren't.