Member since
07-08-2016
25
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3275 | 09-01-2016 07:58 PM |
11-21-2016
08:46 PM
Thanks @dbains My bad. I was certainly under the assumption that `kafka-console-producer.sh` itself produce a test message. Thanks for clarifying that's not the case. I typed in the message and verified that it has been received by the consumer.
... View more
11-21-2016
08:26 PM
I have installed Kafka in HDP 2.5 cluster. bin/kafka-console-producer.sh seems to get stuck and doesn't produce a test message. I have tried the following command, none of them seems to work usr/hdp/current/kafka-broker$ bin/kafka-console-producer.sh --broker-list ip-172-31-103-20.us-west-2.compute.internal:6667 --topic test
usr/hdp/current/kafka-broker$ bin/kafka-console-producer.sh --broker-list localhost:6667 --topic test
usr/hdp/current/kafka-broker$ bin/kafka-console-producer.sh --broker-list ip-172-31-103-20.us-west-2.compute.internal:6667 --topic test --security-protocol PLAINTEXT I have verified that the listener property in /etc/kafka/conf/server.properties on the broker host is correct (listeners=PLAINTEXT://ip-172-31-103-20.us-west-2.compute.internal:6667) I have also tried uncommenting the following three lines in kafka-console-producer.sh script but that doesn't help either #KAFKA_JAAS_CONF=$KAFKA_HOME/config/kafka_jaas.conf
#if [ -f $KAFKA_JAAS_CONF ]; then
# export KAFKA_CLIENT_KERBEROS_PARAMS="-Djava.security.auth.login.config=$KAFKA_HOME/config/kafka_client_jaas.conf"
#fi
Could someone help me debug this issue?
... View more
Labels:
11-18-2016
06:46 AM
Thanks @Mugdha, it worked 🙂
... View more
11-18-2016
05:08 AM
Getting the following error. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 68, in <module>
AmsMonitor().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 42, in start
action = 'start'
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py", line 103, in ams_service
user=params.ams_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start' returned 255. psutil build directory is not empty, continuing...
Verifying Python version compatibility...
Using python /usr/bin/python2.7
Checking for previously running Metric Monitor...
Starting ambari-metrics-monitor
Verifying ambari-metrics-monitor process status...
ERROR: ambari-metrics-monitor start failed. For more details, see /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out:
====================
from metric_collector import MetricsCollector
File "/usr/lib/python2.6/site-packages/resource_monitoring/core/metric_collector.py", line 23, in <module>
from host_info import HostInfo
File "/usr/lib/python2.6/site-packages/resource_monitoring/core/host_info.py", line 22, in <module>
import psutil
File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/__init__.py", line 89, in <module>
import psutil._pslinux as _psplatform
File "/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build/lib.linux-x86_64-2.7/psutil/_pslinux.py", line 20, in <module>
from psutil import _common
ImportError: cannot import name _common
====================
Monitor out at: /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out Strange part is that metrics monitor startup script is using python2.7 but 2.6 site-packages are being looked at. Is that expected? I tried exporting `PYTHON=/usr/bin/python2.6` in ams-env which resulted in metrics monitor failure saying > python 2.6 is needed. I have tried https://community.hortonworks.com/questions/26665/ambari-metrics-monitor-not-starting-importerror-ca.html but that didn't help either.
... View more
Labels:
10-12-2016
06:37 PM
@Srinivas Santhanam I guess you would have figured out by now. For others, the problem with above query is quotes in hdfs path, try without quote like below add jar hdfs:///tmp/udfs/hive/esri-geometry-api.jar
... View more
09-01-2016
11:24 PM
@lraheja etl,prod,reporting,dev,default @swapnil API call is not failing. There is a bug in frontend javascript. See the following screenshots for details. I was able to hack up the javascript to force "hive.server2.tez.default.queues=default" and load the page.
... View more
09-01-2016
10:11 PM
We recently modified `hive.server2.tez.default.queues` to have comma-separated list of string. Since then we can't access the setting tab for Hive. See screenshot. We can't update hive settings now. Could someone suggest a hack to fix this? Is there an easy way to update a single property via Ambari API? screen-shot-2016-09-01-at-30934-pm.png Advanced tab is accessible along with other UI components. I am thinking comma-separated values must be the reason behind the broken setting tab. Using Ambari 2.2.1.1 I have tried loading the page in incognito mode to make sure there is no cache interference but it doesn't seem to help
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
09-01-2016
07:58 PM
It was DHCP failing to see a response from the DHCP server for periods of time. d2 Ubuntu(14.04) instances were using Enhanced Networking and the "ixgbevf" driver 2.11.3-k. 2.11.3-k is below the minimum recommended version 2.14.2 and should be upgraded to 2.16.4. We upgraded the driver to the latest version which seems to have fixed the issue. Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sriov-networking.html#enhanced-networking-ubuntu
... View more
08-30-2016
08:15 AM
We have HDP 2.4 cluster running HDFS, Yarn and HBase on 3 master and 4 data nodes. Each data node hosts HBase RegionServer(8GB heap), HDFS Datanode, and Yarn Nodemanager. Each data node is amazon's d2.xlarge. All master have ZK runnings. Other master processes are HDFS(HA), Hbase and Yarn(HA) masters. Each master node is amazon's r3.xlarge. We see the following problems with two of our data nodes while other nodes function properly. Please note that MR or yarn jobs are not running when this happens : 1. Region Server dies with Zookeeper session timeout exceptions once in a while 2016-08-29 07:08:50,713 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 600097ms for sessionid 0x156d486e2120012, closing socket connection and attempting reconnect
2016-08-29 07:09:00,955 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:01,824 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,824 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,825 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,825 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,825 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,825 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:01,826 WARN [timeline] timeline.HadoopTimelineMetricsSink: Unable to send metrics to collector by address:http://ip-172-31-103-252.us-west-2.compute.internal:6188/ws/v1/timeline/metrics
2016-08-29 07:09:03,952 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:14,960 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:16,808 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:18,061 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:21,060 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:21,399 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:23,640 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:24,182 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:27,180 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:28,949 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:31,948 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:32,446 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:35,444 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:36,208 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:39,024 INFO [main-SendThread(ip-172-31-103-252.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 600081ms for sessionid 0x356d4878aa0001a, closing socket connection and attempting reconnect
2016-08-29 07:09:39,125 WARN [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
2016-08-29 07:09:39,125 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
2016-08-29 07:09:39,125 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
2016-08-29 07:09:39,208 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:40,409 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:43,408 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:44,155 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:47,156 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:47,974 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:49,576 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:49,876 INFO [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:51,266 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:52,876 WARN [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:52,976 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
2016-08-29 07:09:52,976 WARN [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
2016-08-29 07:09:52,976 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
2016-08-29 07:09:54,264 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:54,579 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:09:57,580 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:09:58,048 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:01,048 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:02,282 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:03,680 INFO [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.112/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:05,280 WARN [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:05,280 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:05,380 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
2016-08-29 07:10:05,380 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
2016-08-29 07:10:05,380 WARN [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
2016-08-29 07:10:05,540 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:06,459 INFO [main-SendThread(ip-172-31-103-252.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-252.us-west-2.compute.internal/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:08,540 WARN [main-SendThread(ip-172-31-103-252.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:08,540 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:09,000 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:09,187 INFO [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:12,000 WARN [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:12,000 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:12,101 WARN [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
2016-08-29 07:10:12,101 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
2016-08-29 07:10:12,101 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
2016-08-29 07:10:12,295 INFO [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.112/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:13,750 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:15,292 WARN [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:15,292 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:15,654 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.252/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:17,322 INFO [main-SendThread(ip-172-31-103-252.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-252.us-west-2.compute.internal/172.31.103.252:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:18,652 WARN [main-SendThread(ip-172-31-103-252.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:18,652 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.252:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:19,095 INFO [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:19,620 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.171/172.31.103.171:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:22,096 WARN [main-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:22,096 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(172.31.103.171:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:22,196 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
2016-08-29 07:10:22,196 WARN [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
2016-08-29 07:10:22,196 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
2016-08-29 07:10:22,196 ERROR [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts
2016-08-29 07:10:22,196 ERROR [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 4 attempts
2016-08-29 07:10:22,196 ERROR [ReplicationExecutor-0] zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4 attempts
2016-08-29 07:10:22,196 WARN [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.ZKUtil: regionserver:16020-0x356d4878aa0001a, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, baseZNode=/hbase-unsecure Unable to get data of znode /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:672)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:648)
at org.apache.hadoop.hbase.zookeeper.ZKSplitLog.isRegionMarkedRecoveringInZK(ZKSplitLog.java:159)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1494)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,196 WARN [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.ZKUtil: regionserver:16020-0x356d4878aa0001a, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, baseZNode=/hbase-unsecure Unable to get data of znode /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:711)
at org.apache.hadoop.hbase.zookeeper.ZKAssign.confirmNodeOpening(ZKAssign.java:652)
at org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination.tickleOpening(ZkOpenRegionCoordination.java:160)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$1.progress(OpenRegionHandler.java:371)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4189)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3953)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:949)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:819)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:794)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6328)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6289)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6260)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6216)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6167)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,196 WARN [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl: Got exception in copyQueuesFromRSUsingMulti:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/replication/rs/ip-172-31-103-124.us-west-2.compute.internal,16020,1472451828166
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:511)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.copyQueuesFromRSUsingMulti(ReplicationQueuesZKImpl.java:300)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.claimQueues(ReplicationQueuesZKImpl.java:172)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:570)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,197 ERROR [RS_OPEN_REGION-ip-172-31-103-48:16020-0] zookeeper.ZooKeeperWatcher: regionserver:16020-0x356d4878aa0001a, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:711)
at org.apache.hadoop.hbase.zookeeper.ZKAssign.confirmNodeOpening(ZKAssign.java:652)
at org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination.tickleOpening(ZkOpenRegionCoordination.java:160)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$1.progress(OpenRegionHandler.java:371)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4189)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3953)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:949)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:819)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:794)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6328)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6289)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6260)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6216)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6167)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,197 ERROR [PriorityRpcServer.handler=4,queue=0,port=16020] zookeeper.ZooKeeperWatcher: regionserver:16020-0x356d4878aa0001a, quorum=ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:672)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:648)
at org.apache.hadoop.hbase.zookeeper.ZKSplitLog.isRegionMarkedRecoveringInZK(ZKSplitLog.java:159)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1494)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,197 FATAL [RS_OPEN_REGION-ip-172-31-103-48:16020-0] regionserver.HRegionServer: ABORTING region server ip-172-31-103-48.us-west-2.compute.internal,16020,1472451821818: Exception refreshing OPENING; region=4ef6634b001b40cd44c40c8406d6d389, context=open_region_progress
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/region-in-transition/4ef6634b001b40cd44c40c8406d6d389
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:711)
at org.apache.hadoop.hbase.zookeeper.ZKAssign.confirmNodeOpening(ZKAssign.java:652)
at org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination.tickleOpening(ZkOpenRegionCoordination.java:160)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$1.progress(OpenRegionHandler.java:371)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4189)
at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3953)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:949)
at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:819)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:794)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6328)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6289)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6260)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6216)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6167)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,197 ERROR [PriorityRpcServer.handler=4,queue=0,port=16020] regionserver.RSRpcServices: Can't retrieve recovering state from zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:672)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:648)
at org.apache.hadoop.hbase.zookeeper.ZKSplitLog.isRegionMarkedRecoveringInZK(ZKSplitLog.java:159)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1494)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,198 FATAL [RS_OPEN_REGION-ip-172-31-103-48:16020-0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint]
2016-08-29 07:10:22,198 ERROR [PriorityRpcServer.handler=4,queue=0,port=16020] ipc.RpcServer: Unexpected throwable object
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase-unsecure/recovering-regions/0e281d12463252983d18abbe9e096fbd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:672)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:648)
at org.apache.hadoop.hbase.zookeeper.ZKSplitLog.isRegionMarkedRecoveringInZK(ZKSplitLog.java:159)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1494)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22239)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
2016-08-29 07:10:22,209 INFO [RS_OPEN_REGION-ip-172-31-103-48:16020-0] regionserver.HRegionServer: Dump of metrics as JSON on abort: {
"beans" : [ {
"name" : "java.lang:type=Memory",
"modelerType" : "sun.management.MemoryImpl",
"Verbose" : true,
"ObjectPendingFinalizationCount" : 0,
"NonHeapMemoryUsage" : {
"committed" : 81408000,
"init" : 2555904,
"max" : -1,
"used" : 80115416
},
"HeapMemoryUsage" : {
"committed" : 8536260608,
"init" : 8589934592,
"max" : 8536260608,
"used" : 1738968880
},
"ObjectName" : "java.lang:type=Memory"
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=IPC",
"modelerType" : "RegionServer,sub=IPC",
"tag.Context" : "regionserver",
"tag.Hostname" : "ip-172-31-103-48",
"queueSize" : 0,
"numCallsInGeneralQueue" : 0,
"numCallsInReplicationQueue" : 0,
"numCallsInPriorityQueue" : 0,
"numOpenConnections" : 1,
"numActiveHandler" : 0,
"receivedBytes" : 1190510401,
"exceptions.RegionMovedException" : 10,
"authenticationSuccesses" : 0,
"authorizationFailures" : 0,
"TotalCallTime_num_ops" : 5758,
"TotalCallTime_min" : 0,
"TotalCallTime_max" : 69392,
"TotalCallTime_mean" : 29.966828759986107,
"TotalCallTime_median" : 3.0,
"TotalCallTime_75th_percentile" : 6.0,
"TotalCallTime_95th_percentile" : 11.0,
"TotalCallTime_99th_percentile" : 17.0,
"exceptions.RegionTooBusyException" : 0,
"exceptions.FailedSanityCheckException" : 0,
"exceptions.UnknownScannerException" : 0,
"exceptions.OutOfOrderScannerNextException" : 0,
"exceptions" : 11,
"ProcessCallTime_num_ops" : 5758,
"ProcessCallTime_min" : 0,
"ProcessCallTime_max" : 69391,
"ProcessCallTime_mean" : 29.88711358110455,
"ProcessCallTime_median" : 3.0,
"ProcessCallTime_75th_percentile" : 6.0,
"ProcessCallTime_95th_percentile" : 11.0,
"ProcessCallTime_99th_percentile" : 17.0,
"exceptions.NotServingRegionException" : 0,
"authorizationSuccesses" : 4,
"sentBytes" : 2445857,
"QueueCallTime_num_ops" : 5758,
"QueueCallTime_min" : 0,
"QueueCallTime_max" : 10,
"QueueCallTime_mean" : 0.07971517888155609,
"QueueCallTime_median" : 0.0,
"QueueCallTime_75th_percentile" : 0.0,
"QueueCallTime_95th_percentile" : 1.0,
"QueueCallTime_99th_percentile" : 1.0,
"authenticationFailures" : 0
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Replication",
"modelerType" : "RegionServer,sub=Replication",
"tag.Context" : "regionserver",
"tag.Hostname" : "ip-172-31-103-48",
"sink.appliedOps" : 0,
"sink.ageOfLastAppliedOp" : 0,
"sink.appliedBatches" : 0
} ],
"beans" : [ {
"name" : "Hadoop:service=HBase,name=RegionServer,sub=Server",
"modelerType" : "RegionServer,sub=Server",
"tag.zookeeperQuorum" : "ip-172-31-103-112.us-west-2.compute.internal:2181,ip-172-31-103-171.us-west-2.compute.internal:2181,ip-172-31-103-252.us-west-2.compute.internal:2181",
"tag.serverName" : "ip-172-31-103-48.us-west-2.compute.internal,16020,1472451821818",
"tag.clusterId" : "aa465b2d-db65-4316-87b9-fff8ca04e997",
"tag.Context" : "regionserver",
"tag.Hostname" : "ip-172-31-103-48",
"regionCount" : 168,
"storeCount" : 168,
"hlogFileCount" : 11,
"hlogFileSize" : 1180272772,
"storeFileCount" : 278,
"memStoreSize" : 1010265400,
"storeFileSize" : 112800764928,
"regionServerStartTime" : 1472451821818,
"totalRequestCount" : 361717,
"readRequestCount" : 0,
"writeRequestCount" : 335401,
"checkMutateFailedCount" : 0,
"checkMutatePassedCount" : 0,
"storeFileIndexSize" : 4179624,
"staticIndexSize" : 502936093,
"staticBloomSize" : 289169680,
"mutationsWithoutWALCount" : 0,
"mutationsWithoutWALSize" : 0,
"percentFilesLocal" : 83,
"percentFilesLocalSecondaryRegions" : 0,
"splitQueueLength" : 0,
"compactionQueueLength" : 1,
"flushQueueLength" : 0,
"blockCacheFreeSize" : 3403982448,
"blockCacheCount" : 63,
"blockCacheSize" : 10521744,
"blockCacheHitCount" : 78562,
"blockCacheHitCountPrimary" : 78562,
"blockCacheMissCount" : 169129,
"blockCacheMissCountPrimary" : 169129,
"blockCacheEvictionCount" : 0,
"blockCacheEvictionCountPrimary" : 0,
"blockCacheCountHitPercent" : 31.0,
"blockCacheExpressHitPercent" : 99,
"updatesBlockedTime" : 0,
"flushedCellsCount" : 40535,
"compactedCellsCount" : 2779805,
"majorCompactedCellsCount" : 247649,
"flushedCellsSize" : 152286384,
"compactedCellsSize" : 8624831557,
"majorCompactedCellsSize" : 825856378,
"blockedRequestCount" : 0,
"Mutate_num_ops" : 12673,
"Mutate_min" : 0,
"Mutate_max" : 69383,
"Mutate_mean" : 13.328335832083958,
"Mutate_median" : 2.0,
"Mutate_75th_percentile" : 3.0,
"Mutate_95th_percentile" : 5.0,
"Mutate_99th_percentile" : 10.0,
"slowAppendCount" : 0,
"slowDeleteCount" : 0,
"Increment_num_ops" : 0,
"Increment_min" : 0,
"Increment_max" : 0,
"Increment_mean" : 0.0,
"Increment_median" : 0.0,
"Increment_75th_percentile" : 0.0,
"Increment_95th_percentile" : 0.0,
"Increment_99th_percentile" : 0.0,
"Replay_num_ops" : 0,
"Replay_min" : 0,
"Replay_max" : 0,
"Replay_mean" : 0.0,
"Replay_median" : 0.0,
"Replay_75th_percentile" : 0.0,
"Replay_95th_percentile" : 0.0,
"Replay_99th_percentile" : 0.0,
"FlushTime_num_ops" : 1,
"FlushTime_min" : 70197,
"FlushTime_max" : 70197,
"FlushTime_mean" : 70197.0,
"FlushTime_median" : 70197.0,
"FlushTime_75th_percentile" : 70197.0,
"FlushTime_95th_percentile" : 70197.0,
"FlushTime_99th_percentile" : 70197.0,
"Delete_num_ops" : 0,
"Delete_min" : 0,
"Delete_max" : 0,
"Delete_mean" : 0.0,
"Delete_median" : 0.0,
"Delete_75th_percentile" : 0.0,
"Delete_95th_percentile" : 0.0,
"Delete_99th_percentile" : 0.0,
"splitRequestCount" : 0,
"splitSuccessCount" : 0,
"slowGetCount" : 0,
"Get_num_ops" : 0,
"Get_min" : 0,
"Get_max" : 0,
"Get_mean" : 0.0,
"Get_median" : 0.0,
"Get_75th_percentile" : 0.0,
"Get_95th_percentile" : 0.0,
"Get_99th_percentile" : 0.0,
"ScanNext_num_ops" : 0,
"ScanNext_min" : 0,
"ScanNext_max" : 0,
"ScanNext_mean" : 0.0,
"ScanNext_median" : 0.0,
"ScanNext_75th_percentile" : 0.0,
"ScanNext_95th_percentile" : 0.0,
"ScanNext_99th_percentile" : 0.0,
"slowPutCount" : 2,
"slowIncrementCount" : 0,
"Append_num_ops" : 0,
"Append_min" : 0,
"Append_max" : 0,
"Append_mean" : 0.0,
"Append_median" : 0.0,
"Append_75th_percentile" : 0.0,
"Append_95th_percentile" : 0.0,
"Append_99th_percentile" : 0.0,
"SplitTime_num_ops" : 0,
"SplitTime_min" : 0,
"SplitTime_max" : 0,
"SplitTime_mean" : 0.0,
"SplitTime_median" : 0.0,
"SplitTime_75th_percentile" : 0.0,
"SplitTime_95th_percentile" : 0.0,
"SplitTime_99th_percentile" : 0.0
} ]
}
2016-08-29 07:10:22,209 INFO [RS_OPEN_REGION-ip-172-31-103-48:16020-0] regionserver.HRegionServer: STOPPED: Exception refreshing OPENING; region=4ef6634b001b40cd44c40c8406d6d389, context=open_region_progress
2016-08-29 07:10:22,526 INFO [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Opening socket connection to server 172.31.103.112/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:23,354 INFO [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-103-112.us-west-2.compute.internal/172.31.103.112:2181. Will not attempt to authenticate using SASL (unknown error)
2016-08-29 07:10:24,403 INFO [ip-172-31-103-48.us-west-2.compute.internal,16020,1472451821818_ChoreService_1] regionserver.HRegionServer$CompactionChecker: Chore: CompactionChecker was stopped
2016-08-29 07:10:24,404 INFO [ip-172-31-103-48.us-west-2.compute.internal,16020,1472451821818_ChoreService_1] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: ip-172-31-103-48.us-west-2.compute.internal,16020,1472451821818-MemstoreFlusherChore was stopped
2016-08-29 07:10:24,783 INFO [MemStoreFlusher.0] regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
2016-08-29 07:10:25,524 WARN [regionserver/ip-172-31-103-48.us-west-2.compute.internal/172.31.103.48:16020-SendThread(ip-172-31-103-112.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x156d486e2120012 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2016-08-29 07:10:25,524 WARN [main-SendThread(172.31.103.112:2181)] zookeeper.ClientCnxn: Session 0x356d4878aa0001a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2. Yarn node manager dies less frequently but with similar network connection issues 2016-08-29 15:47:13,951 FATAL nodemanager.NodeManager (NodeManager.java:run(360)) - Error while rebooting NodeStatusUpdater.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from java.net.UnknownHostException: ip-172-31-103-48: ip-172-31-103-48: unknown error to ip-172-31-103-112.us-west-2.compute.internal:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:254)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:357)
Caused by: java.net.NoRouteToHostException: No Route to Host from java.net.UnknownHostException: ip-172-31-103-48: ip-172-31-103-48: unknown error to ip-172-31-103-112.us-west-2.compute.internal:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
at org.apache.hadoop.ipc.Client.call(Client.java:1430)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy82.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy83.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:296)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:246)
... 1 more
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:617)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:715)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
... 13 more
2016-08-29 15:47:13,961 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
3. Node becomes unresponsive sometimes(mostly after region server failure and continues to come on and off even though region server is no longer running), you can't login to it. AWS instance check fails. It comes back after few mins/few hours. There is surely some network misconfiguration in the cluster but above issues happens only on few machines. The cluster is running in VPC. ulimit and nproc are set to 32768 and 65536 respectively. Most host metrics looks normal Any ideas on debugging this would be greatly appreciated. Thanks
... View more
Labels: